[RFC stage 1] Proposed new warning: -Wmisleading-indentation
David Malcolm
dmalcolm@redhat.com
Thu Apr 16 15:08:00 GMT 2015
Attached is a work-in-progress patch for a new
-Wmisleading-indentation
warning I've been experimenting with, for GCC 6.
It kind-of-works, but there are some issues, so I wanted to get feedback
on it here.
The idea is to issue a warning when the C/C++ compiler's view of the
block structure doesn't match that of a naive human reader of the code
via indentation.
For example, CVE-2014-1266 (aka "goto fail") had:
hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
hashOut.length = SSL_SHA1_DIGEST_LEN;
if ((err = SSLFreeBuffer(&hashCtx)) != 0)
goto fail;
if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
err = sslRawVerify(...);
where the second "goto fail;" here:
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
is indented as if it's guarded by the conditional. It isn't, making the
real meaning of the code hard to discern, and masking a security
vulnerability (where the stray goto led to the certificate-checking
code later in the function being dead code, skipping it with err == 0,
for success).
The patch successfully issues a warning on a simplified version of this:
if ((err = foo (b)) != 0)
goto fail;
goto fail;
./Wmisleading-indentation-6.c: In function âfooâ:
./Wmisleading-indentation-6.c:15:3: warning: statement is indented as if
it were guarded by... [-Wmisleading-indentation]
goto fail;
^
./Wmisleading-indentation-6.c:13:2: note: ...this clause, but it is not
if ((err = foo (b)) != 0)
^
and on various other testcases.
That said, there are several major issues with the patch as it stands:
(A) works with the C FE, but not with C++ yet. The implementation
assumes a single lexer consuming the token stream, so it doesn't yet
work with the C++ FE's ideas of tentatively-consumed tokens
(save/commit/rollback), and with its lexer stack. I *think* that can be
fixed by relying on the fact that the integer locations of the token
stream are normally monotonic, and using that to capture and handle
those two C++ FE impl details.
(B) GTY/PCH/etc, and performance; I know that it leaks visual_block
instances, and there's probably a way to do this with much less dynamic
memory allocation, but I wanted to get it right before making it fast.
(C) no ChangeLog yet, though it's heavily commented.
(D) tabs vs spaces. This is probably the biggest can of worms.
Consider:
{
if (flagA)
if (flagB)
if (flagC)
foo ();
bar (); /* would like to warn about this */
}
If this is indented using spaces&tabs with 8-space tabs, we have:
<no indent>{
<2 spaces>if (flagA)
<4 spaces>if (flagB)
<6 spaces>if (flagC)
<1 tab>foo ();
<1 tab>bar ();
<no indent>}
What does this look like in an editor? Consider emacs, which uses uses
0-based offsets to what I call "visual columns". The column numbers for
the start of the first token for each line are reported by emacs as:
<0>{
<2>if (flagA)
<4>if (flagB)
<6>if (flagC)
<8>foo ();
<8>bar ();
<0>}
However within libcpp and gcc, in linemap's expanded_location and in
diagnostic messages, the "column" numbers are actually 1-based counts of
*characters*, so the "column" numbers emitted in diagnostics for the
start of the first token in each line are actually:
<1>{
<3>if (flagA)
<5>if (flagB)
<7>if (flagC)
<2>foo ();
<2>bar ();
<1>}
Note the transition at tab offsets, where we go from increasing visual
column numbers:
<6>if (flagC)
<8>foo ();
to a decrease in the character count:
<7>if (flagC)
<2>foo ();
Hence to handle mixed spaces+tabs, we need some way to model "visual
columns". I think this is doable (at the cost of adding a field to
cpp_token, and to c_token/cp_token), provided we can assume, per-buffer,
either
(i) a consistent value for tabs in terms of spaces, or
(ii) that we don't have mixed tabs/spaces in this buffer
An issue here is how to determine (i), or if it's OK to default to 8 and
have a command-line option (param?) to override it? (though what about,
say, each header file?)
Thoughts on this, and on the patch?
Thanks
Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: misleading-indentation-v1.patch
Type: text/x-patch
Size: 35383 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150416/b2be6da9/attachment.bin>
More information about the Gcc-patches
mailing list