[PATCH] Refactor -Wmisleading-indentation API and extend coverage

Mon Jun 8 19:36:00 GMT 2015

On Mon, Jun 8, 2015 at 2:11 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> On Sun, 2015-06-07 at 16:06 -0400, Patrick Palka wrote:
>> This patch refactors the entry point of -Wmisleading-indentation
>
> Thanks for working on this.  I was hoping to submit a patch to propose
> putting -Wmisleading-indentation into -Wall, and have been testing the
> warning on the linux kernel to try to shake out false positives; hence
> I'm somewhat nervous about this change, though it mostly seems
> reasonable.

Don't worry, I did a bit of testing myself :) This patch combined with
the patch in PR 66454 is remarkably stable -- I could not find bogus
warnings out of the dozen or so C code bases I compiled.  It would to
great to see this warning enabled with -Wall.  Even in its current
form is pretty useful yet pretty simple (all we need is the
information of three tokens).

>
> I think there are three categories of message that
> -Wmisleading-indentation can emit:
>
>   (A) bogus messages, where the message is untrue.  An example of this
> was PR c/66220 (I found an example of this when running it on the linux
> kernel); this is fixed in trunk.
>
>   (B) a message where the code is doing what the author/reviewer
> intended, but the indentation misleadingly suggests a different meaning
> (or perhaps the author is being underhanded c.f.
> http://www.underhanded-c.org/ ).
>
> I think it would be reasonable to warn about (B) in -Wall: invoke.texi
> describes -Wall as
>
>    "...enables all the warnings about constructions that some users
> consider questionable, and that are easy to avoid (or modify to prevent
> the warning), even in conjunction with macros"
>
> hence I believe (B) falls into this category.  Each instance is a
> "gotcha" lurking in the code for the next maintainer.
>
>   (C) a message where the code is doing something differently to what
> the author intended, and the human has been fooled by the bogus
> indentation.  Clearly you'd want to know about (C), but the compiler has
> no way of distinguishing it from (B).

That makes sense.  I call (A) a false positive but that is probably
misleading, "bogus" sounds better.

>
> [FWIW, in my testing on the linux kernel, I ran into:
>
>   * 1 instance of (A) (PR c/66220, now fixed in trunk), and
>
>   * 8 that are each either (B) or (C), and I don't have the domain
> knowledge of the kernel to figure out which is which, so I've started
> reporting them to the upstream linux community; see e.g.:
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98231
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98241
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98251
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98261
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98271
>     * https://bugzilla.kernel.org/show_bug.cgi?id=98281
>
>   * 2 messages that are confirmed as (B), e.g.
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=e1395a321eab1a7833d82e952eb8255e0a1f03cb

I did not stumble upon these warnings in my tests most likely because
I was compiling with a local config which did not include these source
files.

>
> ]
>
> I believe that gcc trunk is currently bootstrappable with
> -Wmisleading-indentation added to -Wall; is this still the case after
> your patch?  Also, would it is it fair to ask for you to test your
> patched compiler on some body of code that isn't gcc itself?

gcc trunk is currently not bootstrappable with
-Wmisleading-indentation because of this misleadingly-indented code:
https://github.com/gcc-mirror/gcc/blob/master/gcc/function.c#L4076

I tested the patch on a number of C code bases: Linux, sqlite,
binutils-gdb, emacs, alpine, bash, zsh, git, tmux, nginx,  and a
couple more.  The patch emits a bogus warning in the Linux and sqlite
code bases, for which I filed PR 66454 before realizing that the issue
was not pre-existing :(  But this patch combined with an updated
version of the patch attached to PR 66454 makes the feature pretty
robust: I could no longer find any instances of bogus (A) warnings
among the set of code bases I tested, though I did find a bunch of
(B)/(C).   And of course our test cases still pass.

>
> More comments inline below, throughout.
>
>> from:
>>
>>   void
>>   warn_for_misleading_indentation (location_t guard_loc,
>>                                    location_t body_loc,
>>                                    location_t next_stmt_loc,
>>                                    enum cpp_ttype next_tok_type,
>>                                    const char *guard_kind);
>>
>> to
>>
>>   struct common_token_info
>>   {
>>     location_t location;
>>     cpp_ttype type;
>>     rid keyword;
>>   };
>
> Bikeshedding a bit, but would, say, "token_indent_info" or
> "token_location_info" be better?  (I'm not sure)  What do you mean by
> "common" in the title?

Sure!  The name is a horrible placeholder.  I used "common" to mean
that the information we are gathering about the tokens is information
that both c_tokens and cp_tokens have.  Something like "token_info"
sounds too general  though.  And "token_location_info" sounds a bit
misleading because it stores more than just location.
"token_ident_info" sounds passable -- and it is a relatively short
name.  I will use this name instead in absence of better suggestions.

>
>>   void
>>   warn_for_misleading_indentation (const common_token_info &guard_tinfo,
>>                                    const common_token_info &body_tinfo,
>>                                    const common_token_info &next_tinfo);
>
> [Do we allow C++ reference syntax?   I'm OK with it, and some of the
> more "C++y" parts of codebase use it (templates), but I think Jeff
> objected last time I tried to submit a patch with it :) ]

I'm not sure.  The use of references is not a big deal to me in this
case at least.  I will just pass three pointers instead.

>
>> The purpose of this refactoring is to expose more information to the
>> -Wmisleading-indentation implementation to allow for more advanced
>> heuristics and for better coverage.
>
>
>> This change of API of course required changes to the C and C++ frontend.
>> Amidst these minor changes I also made sure in both frontends that
>> warn_for_misleading_indentation() gets called even when the body of the
>> guard in question is a semicolon.  This allows us to implement warnings
>> about dubious semicolons.  Also I moved the keyword != RID_ELSE checks
>> guarding the call to warn_for_misleading_indentation() to the function
>> itself itself.
>>
>> At the same time this patch extends the coverage of the
>> -Wmisleading-indentation implementation to catch misleading indentation
>> involving the semicolon.  (These changes are all contained in
>> c-indentation.c.)  For example, now we warn on the following code
>> samples:
>>
>>   if (flag);
>>     foo ();
>>
>>   while (flag);
>>   {
>>     ...
>>   }
>>
>>   if (flag); {
>>     ...
>>   }
>>
>>   if (flag)
>>     ; /* blah */ {
>>       ...
>>     }
>>
>>   if (flag); foo ();
>>
>> while avoiding to warn on code that is poorly indented but not
>> misleadingly so;
>>
>>   while (flag);
>>   foo ();
>>
>>   while (flag)
>>     ;
>>     foo ();
>>
>>   if (flag1)
>>     ;
>>   if (flag)
>>     ;
>>   else
>>    ...
>
> (nods; I want the warning to warn about *misleading* indentation, rather
> than just bad indentation, since the former is reasonable to report
> about; the latter has more risk of becoming tedious and annoying users).

Exactly!

>
>> Finally this patch reverts the fix to PR 66620 because it regresses the
>> following test cases:
>>
>>   if (foo)
>>     bar ();
>>   else if (baz)
>>     bar ();
>>     bar (); // No warning, because guard_vis_column > body_vis_column
>
> The patch doesn't appear to add a testcase specifically for the above;
> the closest I could see was:
>
>> +  if (flagA)
>> +    ;
>> +  else if (flagA) /* { dg-message "8: ...this 'if' clause" } */
>> +    foo (0);
>> +    foo (0); /* { dg-warning "statement is indented as if" } */
>
> (unless I missed it; sorry if that's the case).  Given that the
> heuristics are already non-trivial, could the patch please include a
> testcase specifically for each of the examples you cite in this email?
> I'd rather have duplicate testing than risk missing some cases (it's one
> file, so it shouldn't slow down "make check").

Good point.  The body of the first "if" in the two mentioned test
cases shouldn't matter given the current heuristics, but that may
change in the future.

>
>> The better fix for this PR IMO is to bail out early when the the body is
>> a braced compound statement.  This is what the patch does.  In general,
>> when the body of the guard in question is braced then control flow is
>> visually explicit despite any potentially sloppy indentation.  This
>> restriction could always be relaxed however.
>
> This seems reasonable to me.

The problem is that the revert of the original fix for that PR causes
regressions as documented in PR 66454.  The next iteration of this
patch will be a pair of patches that are regression-free (using the
attached fix in PR 66454).

>
>> Bootstrap and regtest on x86_64-linux in progress, OK to commit if succeeds?
>>
>> gcc/c-family/ChangeLog:
>>
>>       * c-common.h (struct common_token_info): Define.
>>       (extract_token_info): Define.
>>       (warn_for_misleading_information): Change declaration to take
>>       three common_token_infos.
>>       * c-identation.c (should_warn_for_misleading_indentation):
>>       Likewise.  Bail out early if the body is a compound statement or
>>       if the next token is the else keyword.  Float out the definition
>>       of guard_exploc.  Don't warn on code containing a semicolon body
>>       on its own line.  Remove fix for PR 66220.  Add heuristics for
>>       when a semicolon body is on the same line as the guard.
>>       (guard_tinfo_to_string): Define.
>>       (warn_for_misleading_indentation): Change declaration to take
>>       three common_token_infos.  Adjust accordingly.
>>
>> gcc/c/ChangeLog:
>>
>>       * c-parser.c (c_parser_if_body): Call
>>       warn_for_misleading_indentation even when the body is a
>>       semicolon.  Extract common_token_infos corresponding to the
>>       guard, body and next tokens.  Adjust call to
>>       warn_for_misleading_indentation accordingly.
>>       (c_parser_else_body): Likewise.
>>       (c_parser_if_statement): Likewise.
>>       (c_parser_while_statement): Likewise.
>>       (c_parser_for_statement): Likewise.
>>
>> gcc/cp/ChangeLog:
>>
>>       * parser.c (cp_parser_selection_statement): Move handling of
>>       semicolon body to ...
>>       (cp_parser_implicitly_scoped_statement): .. here.  Call
>>       warn_for_misleading_indentation even when the body is a
>>       semicolon.  Extract common_token_infos corresponding to the
>>       guard, body and next tokens.  Adjust call to
>>       warn_for_misleading_indentation accordingly.
>>       (cp_parser_already_scoped_statement): Likewise.
>>       (cp_parser_selection_statement, cp_parser_iteration_statement):
>>       Extract a common_token_info corresponding to the guard token.
>>
>> gcc/testsuite/ChangeLog:
>>
>>       * c-c++-common/Wmisleading-indentation.c: Augment test.
>> ---
>>  gcc/c-family/c-common.h                            |  34 ++++-
>>  gcc/c-family/c-indentation.c                       | 144 ++++++++++++++++-----
>>  gcc/c/c-parser.c                                   |  82 ++++++------
>>  gcc/cp/parser.c                                    |  99 ++++++--------
>>  .../c-c++-common/Wmisleading-indentation.c         |  99 ++++++++++++++
>>  5 files changed, 325 insertions(+), 133 deletions(-)
>>
>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
>> index af86e26..f152de6 100644
>> --- a/gcc/c-family/c-common.h
>> +++ b/gcc/c-family/c-common.h
>> @@ -1435,12 +1435,36 @@ extern bool contains_cilk_spawn_stmt (tree);
>>  extern tree cilk_for_number_of_iterations (tree);
>>  extern bool check_no_cilk (tree, const char *, const char *,
>>                          location_t loc = UNKNOWN_LOCATION);
>> +
>> +
>> +/* A subset of token information that is common between the C and C++ frontends.  */
>> +
>> +struct common_token_info
>> +{
>> +  location_t location;
>> +  ENUM_BITFIELD (cpp_ttype) type : 8;
>> +  ENUM_BITFIELD (rid) keyword : 8;
>> +};
>> +
>> +/* Extract common token information from TOKEN, which ought to either be a
>> +   cp_token * or a c_token *.  */
>> +
>> +template <typename T>
>> +inline common_token_info
>> +extract_token_info (const T *token)
>> +{
>> +  common_token_info info;
>> +  info.location = token->location;
>> +  info.type = token->type;
>> +  info.keyword = token->keyword;
>> +
>> +  return info;
>> +}
>> +
>>  /* In c-indentation.c.  */
>>  extern void
>> -warn_for_misleading_indentation (location_t guard_loc,
>> -                              location_t body_loc,
>> -                              location_t next_stmt_loc,
>> -                              enum cpp_ttype next_tok_type,
>> -                              const char *guard_kind);
>> +warn_for_misleading_indentation (const common_token_info &guard_tinfo,
>> +                              const common_token_info &body_tinfo,
>> +                              const common_token_info &next_tinfo);
>>
>>  #endif /* ! GCC_C_COMMON_H */
>> diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
>> index 5c78e18..d3e842b 100644
>> --- a/gcc/c-family/c-indentation.c
>> +++ b/gcc/c-family/c-indentation.c
>> @@ -190,11 +190,17 @@ detect_preprocessor_logic (expanded_location body_exploc,
>>     description of that function below.  */
>>
>>  static bool
>> -should_warn_for_misleading_indentation (location_t guard_loc,
>> -                                     location_t body_loc,
>> -                                     location_t next_stmt_loc,
>> -                                     enum cpp_ttype next_tok_type)
>> +should_warn_for_misleading_indentation (const common_token_info &guard_tinfo,
>> +                                     const common_token_info &body_tinfo,
>> +                                     const common_token_info &next_tinfo)
>>  {
>> +  location_t guard_loc = guard_tinfo.location;
>> +  location_t body_loc = body_tinfo.location;
>> +  location_t next_stmt_loc = next_tinfo.location;
>> +
>> +  enum cpp_ttype body_type = body_tinfo.type;
>> +  enum cpp_ttype next_tok_type = next_tinfo.type;
>> +
>>    /* Don't attempt to compare the indentation of BODY_LOC and NEXT_STMT_LOC
>>       if either are within macros.  */
>>    if (linemap_location_from_macro_expansion_p (line_table, body_loc)
>> @@ -220,7 +226,9 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>    if (line_table->seen_line_directive)
>>      return false;
>>
>> -  if (next_tok_type == CPP_CLOSE_BRACE)
>> +  if (body_type == CPP_OPEN_BRACE
>> +      || next_tok_type == CPP_CLOSE_BRACE
>> +      || next_tinfo.keyword == RID_ELSE)
>>      return false;
>
> Perhaps it's hypocritical of me to ask this given that it didn't have
> one already, but could the above guard gain a comment describing the
> intent?  (ideally with examples of what we reject here?)

Will do.

>
>>    /* Don't warn here about spurious semicolons.  */
>> @@ -229,6 +237,7 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>
>>    expanded_location body_exploc = expand_location (body_loc);
>>    expanded_location next_stmt_exploc = expand_location (next_stmt_loc);
>> +  expanded_location guard_exploc = expand_location (guard_loc);
>>
>>    /* They must be in the same file.  */
>>    if (next_stmt_exploc.file != body_exploc.file)
>> @@ -246,13 +255,20 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>         if (flag) foo (); bar ();
>>                           ^ WARN HERE
>>
>> +
>> +       if (flag) ; {
>> +                   ^ WARN HERE
>> +
>> +       if (flag)
>> +        ; {
>> +          ^ WARN HERE
>> +
>>       Cases where we don't want to issue a warning:
>>
>>         various_code (); if (flag) foo (); bar (); more_code ();
>>                                            ^ DON'T WARN HERE.  */
>>    if (next_stmt_exploc.line == body_exploc.line)
>>      {
>> -      expanded_location guard_exploc = expand_location (guard_loc);
>>        if (guard_exploc.file != body_exploc.file)
>>       return true;
>>        if (guard_exploc.line < body_exploc.line)
>> @@ -294,14 +310,10 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>         bar ();
>>         ^ DON'T WARN HERE
>>
>> -        if (flag) {
>> -          foo ();
>> -        } else
>> -        {
>> -          bar ();
>> -        }
>> -        baz ();
>> -        ^ DON'T WARN HERE
>
> I notice you delete part of the comment here; where is this case now
> rejected?  Can that go in a comment please?  (I believe that it's now at
> the clause I highlighted above)

Will do.

>
>> +     if (flag)
>> +       ;
>> +       foo ();
>> +       ^ DON'T WARN HERE
>>    */
>>    if (next_stmt_exploc.line > body_exploc.line)
>>      {
>> @@ -317,23 +329,22 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>       return false;
>>        if (!get_visual_column (body_exploc, &body_vis_column))
>>       return false;
>> -      if (next_stmt_vis_column == body_vis_column)
>> +
>> +      /* Don't emit a warning when the semicolon body is on its own line.  */
>> +      if (body_exploc.line > guard_exploc.line
>> +       && body_type == CPP_SEMICOLON)
>> +     ;
>> +       else if (next_stmt_vis_column == body_vis_column)
>>       {
>>         /* Don't warn if they aren't aligned on the same column
>>            as the guard itself (suggesting autogenerated code that
>>            doesn't bother indenting at all).  */
>> -       expanded_location guard_exploc = expand_location (guard_loc);
>>         unsigned int guard_vis_column;
>>         if (!get_visual_column (guard_exploc, &guard_vis_column))
>>           return false;
>>         if (guard_vis_column == body_vis_column)
>>           return false;
>>
>> -       /* PR 66220: Don't warn if the guarding statement is more
>> -          indented than the next/body stmts.  */
>> -       if (guard_vis_column > body_vis_column)
>> -         return false;
>> -
>>         /* Don't warn if there is multiline preprocessor logic between
>>            the two statements. */
>>         if (detect_preprocessor_logic (body_exploc, next_stmt_exploc))
>> @@ -342,11 +353,76 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>         /* Otherwise, they are visually aligned: issue a warning.  */
>>         return true;
>>       }
>> +
>> +     /* Also issue a warning for code having the form:
>> +
>> +        if (flag);
>> +          foo ();
>> +
>> +        while (flag);
>> +        {
>> +          ...
>> +        }
>> +
>> +        for (...);
>> +          {
>> +            ...
>> +          }
>> +
>> +        if (flag)
>> +          ;
>> +        else if (flag);
>> +          foo ();
>> +
>> +        where the semicolon at the end of each guard is most likely spurious.
>> +
>> +        But do not warn on:
>> +
>> +        for (..);
>> +        foo ();
>> +
>> +        where the next statement is aligned with the guard.
>> +     */
>> +     if (body_type == CPP_SEMICOLON)
>> +       {
>> +         if (body_exploc.line == guard_exploc.line)
>> +           {
>> +             unsigned int guard_vis_column;
>> +             if (!get_visual_column (guard_exploc, &guard_vis_column))
>> +               return false;
>> +             if (next_stmt_vis_column != guard_vis_column
>> +                 || (next_tok_type == CPP_OPEN_BRACE
>> +                     && next_stmt_vis_column == guard_vis_column))
>> +               return true;
>> +           }
>> +       }
>>      }
>>
>>    return false;
>>  }
>>
>> +/* Return the string indentifier corresponding to the guard token.  */
>
> Typo: s/indentifier/identifier/

Dang.

>
>
>> +static const char *
>> +guard_tinfo_to_string (const common_token_info &guard_tinfo)
>> +{
>> +  switch (guard_tinfo.keyword)
>> +    {
>> +    case RID_FOR:
>> +      return "for";
>> +    case RID_ELSE:
>> +      return "else";
>> +    case RID_IF:
>> +      return "if";
>> +    case RID_WHILE:
>> +      return "while";
>> +    case RID_DO:
>> +      return "do";
>> +    default:
>> +      return "";
>
> Should the default have a "gcc_unreachable ();", or is this indeed
> reachable?

It currently should not be reachable.  I will change it use gcc_unreachable ().

>
>> +    }
>> +}
>> +
>>  /* Called by the C/C++ frontends when we have a guarding statement at
>>     GUARD_LOC containing a statement at BODY_LOC, where the block wasn't
>>     written using braces, like this:
>> @@ -374,11 +450,9 @@ should_warn_for_misleading_indentation (location_t guard_loc,
>>     GUARD_KIND identifies the kind of clause e.g. "if", "else" etc.  */
>>
>>  void
>> -warn_for_misleading_indentation (location_t guard_loc,
>> -                              location_t body_loc,
>> -                              location_t next_stmt_loc,
>> -                              enum cpp_ttype next_tok_type,
>> -                              const char *guard_kind)
>> +warn_for_misleading_indentation (const common_token_info &guard_tinfo,
>> +                              const common_token_info &body_tinfo,
>> +                              const common_token_info &next_tinfo)
>>  {
>>    /* Early reject for the case where -Wmisleading-indentation is disabled,
>>       to avoid doing work only to have the warning suppressed inside the
>> @@ -386,12 +460,14 @@ warn_for_misleading_indentation (location_t guard_loc,
>>    if (!warn_misleading_indentation)
>>      return;
>>
>> -  if (should_warn_for_misleading_indentation (guard_loc,
>> -                                           body_loc,
>> -                                           next_stmt_loc,
>> -                                           next_tok_type))
>> -    if (warning_at (next_stmt_loc, OPT_Wmisleading_indentation,
>> -                 "statement is indented as if it were guarded by..."))
>> -      inform (guard_loc,
>> -           "...this %qs clause, but it is not", guard_kind);
>> +  if (should_warn_for_misleading_indentation (guard_tinfo,
>> +                                           body_tinfo,
>> +                                           next_tinfo))
>> +    {
>> +      if (warning_at (next_tinfo.location, OPT_Wmisleading_indentation,
>> +                   "statement is indented as if it were guarded by..."))
>> +        inform (guard_tinfo.location,
>> +             "...this %qs clause, but it is not",
>> +             guard_tinfo_to_string (guard_tinfo));
>> +    }
>>  }
>> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
>> index c691e76..7705bd4 100644
>> --- a/gcc/c/c-parser.c
>> +++ b/gcc/c/c-parser.c
>> @@ -5193,10 +5193,14 @@ c_parser_c99_block_statement (c_parser *parser)
>>     parser->in_if_block.  */
>>
>>  static tree
>> -c_parser_if_body (c_parser *parser, bool *if_p, location_t if_loc)
>> +c_parser_if_body (c_parser *parser, bool *if_p,
>> +               const common_token_info &if_tinfo)
>>  {
>>    tree block = c_begin_compound_stmt (flag_isoc99);
>>    location_t body_loc = c_parser_peek_token (parser)->location;
>> +  common_token_info body_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +
>>    c_parser_all_labels (parser);
>>    *if_p = c_parser_next_token_is_keyword (parser, RID_IF);
>>    if (c_parser_next_token_is (parser, CPP_SEMICOLON))
>> @@ -5211,14 +5215,11 @@ c_parser_if_body (c_parser *parser, bool *if_p, location_t if_loc)
>>    else if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
>>      add_stmt (c_parser_compound_statement (parser));
>>    else
>> -    {
>> -      c_parser_statement_after_labels (parser);
>> -      if (!c_parser_next_token_is_keyword (parser, RID_ELSE))
>> -     warn_for_misleading_indentation (if_loc, body_loc,
>> -                                      c_parser_peek_token (parser)->location,
>> -                                      c_parser_peek_token (parser)->type,
>> -                                      "if");
>> -    }
>> +    c_parser_statement_after_labels (parser);
>> +
>> +  common_token_info next_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +  warn_for_misleading_indentation (if_tinfo, body_tinfo, next_tinfo);
>>
>>    return c_end_compound_stmt (body_loc, block, flag_isoc99);
>>  }
>> @@ -5228,28 +5229,30 @@ c_parser_if_body (c_parser *parser, bool *if_p, location_t if_loc)
>>     specially for the sake of -Wempty-body warnings.  */
>>
>>  static tree
>> -c_parser_else_body (c_parser *parser, location_t else_loc)
>> +c_parser_else_body (c_parser *parser, const common_token_info &else_tinfo)
>>  {
>>    location_t body_loc = c_parser_peek_token (parser)->location;
>>    tree block = c_begin_compound_stmt (flag_isoc99);
>> +  common_token_info body_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +
>>    c_parser_all_labels (parser);
>>    if (c_parser_next_token_is (parser, CPP_SEMICOLON))
>>      {
>>        location_t loc = c_parser_peek_token (parser)->location;
>>        warning_at (loc,
>>                 OPT_Wempty_body,
>> -              "suggest braces around empty body in an %<else%> statement");
>> +               "suggest braces around empty body in an %<else%> statement");
>
> Was this purely a whitespace change?  If so, probably leave it for a
> different commit.

Yep.  I'll leave it out.

>
>>        add_stmt (build_empty_stmt (loc));
>>        c_parser_consume_token (parser);
>>      }
>>    else
>> -    {
>> -      c_parser_statement_after_labels (parser);
>> -      warn_for_misleading_indentation (else_loc, body_loc,
>> -                                    c_parser_peek_token (parser)->location,
>> -                                    c_parser_peek_token (parser)->type,
>> -                                    "else");
>> -    }
>> +    c_parser_statement_after_labels (parser);
>> +
>> +  common_token_info next_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +  warn_for_misleading_indentation (else_tinfo, body_tinfo, next_tinfo);
>> +
>>    return c_end_compound_stmt (body_loc, block, flag_isoc99);
>>  }
>>
>> @@ -5272,7 +5275,8 @@ c_parser_if_statement (c_parser *parser)
>>    tree if_stmt;
>>
>>    gcc_assert (c_parser_next_token_is_keyword (parser, RID_IF));
>> -  location_t if_loc = c_parser_peek_token (parser)->location;
>> +  common_token_info if_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>>    c_parser_consume_token (parser);
>>    block = c_begin_compound_stmt (flag_isoc99);
>>    loc = c_parser_peek_token (parser)->location;
>> @@ -5284,13 +5288,14 @@ c_parser_if_statement (c_parser *parser)
>>      }
>>    in_if_block = parser->in_if_block;
>>    parser->in_if_block = true;
>> -  first_body = c_parser_if_body (parser, &first_if, if_loc);
>> +  first_body = c_parser_if_body (parser, &first_if, if_tinfo);
>>    parser->in_if_block = in_if_block;
>>    if (c_parser_next_token_is_keyword (parser, RID_ELSE))
>>      {
>> -      location_t else_loc = c_parser_peek_token (parser)->location;
>> +      common_token_info else_tinfo
>> +     = extract_token_info (c_parser_peek_token (parser));
>>        c_parser_consume_token (parser);
>> -      second_body = c_parser_else_body (parser, else_loc);
>> +      second_body = c_parser_else_body (parser, else_tinfo);
>>      }
>>    else
>>      second_body = NULL_TREE;
>> @@ -5370,7 +5375,8 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
>>    tree block, cond, body, save_break, save_cont;
>>    location_t loc;
>>    gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE));
>> -  location_t while_loc = c_parser_peek_token (parser)->location;
>> +  common_token_info while_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>>    c_parser_consume_token (parser);
>>    block = c_begin_compound_stmt (flag_isoc99);
>>    loc = c_parser_peek_token (parser)->location;
>> @@ -5388,14 +5394,14 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
>>    save_cont = c_cont_label;
>>    c_cont_label = NULL_TREE;
>>
>> -  location_t body_loc = UNKNOWN_LOCATION;
>> -  if (c_parser_peek_token (parser)->type != CPP_OPEN_BRACE)
>> -    body_loc = c_parser_peek_token (parser)->location;
>> +  common_token_info body_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +
>>    body = c_parser_c99_block_statement (parser);
>> -  warn_for_misleading_indentation (while_loc, body_loc,
>> -                                c_parser_peek_token (parser)->location,
>> -                                c_parser_peek_token (parser)->type,
>> -                                "while");
>> +
>> +  common_token_info next_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +  warn_for_misleading_indentation (while_tinfo, body_tinfo, next_tinfo);
>>
>>    c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
>>    add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
>> @@ -5515,6 +5521,8 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
>>    location_t for_loc = c_parser_peek_token (parser)->location;
>>    bool is_foreach_statement = false;
>>    gcc_assert (c_parser_next_token_is_keyword (parser, RID_FOR));
>> +  common_token_info for_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>>    c_parser_consume_token (parser);
>>    /* Open a compound statement in Objective-C as well, just in case this is
>>       as foreach expression.  */
>> @@ -5675,14 +5683,14 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
>>    save_cont = c_cont_label;
>>    c_cont_label = NULL_TREE;
>>
>> -  location_t body_loc = UNKNOWN_LOCATION;
>> -  if (c_parser_peek_token (parser)->type != CPP_OPEN_BRACE)
>> -    body_loc = c_parser_peek_token (parser)->location;
>> +  common_token_info body_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +
>>    body = c_parser_c99_block_statement (parser);
>> -  warn_for_misleading_indentation (for_loc, body_loc,
>> -                                c_parser_peek_token (parser)->location,
>> -                                c_parser_peek_token (parser)->type,
>> -                                "for");
>> +
>> +  common_token_info next_tinfo
>> +    = extract_token_info (c_parser_peek_token (parser));
>> +  warn_for_misleading_indentation (for_tinfo, body_tinfo, next_tinfo);
>>
>>    if (is_foreach_statement)
>>      objc_finish_foreach_loop (loc, object_expression, collection_expression, body, c_break_label, c_cont_label);
>> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
>> index 2ea61c0..a220f3e 100644
>> --- a/gcc/cp/parser.c
>> +++ b/gcc/cp/parser.c
>> @@ -2062,9 +2062,9 @@ static void cp_parser_declaration_statement
>>    (cp_parser *);
>>
>>  static tree cp_parser_implicitly_scoped_statement
>> -  (cp_parser *, bool *, location_t, const char *);
>> +  (cp_parser *, bool *, const common_token_info &);
>>  static void cp_parser_already_scoped_statement
>> -  (cp_parser *, location_t, const char *);
>> +  (cp_parser *, const common_token_info &);
>>
>>  /* Declarations [gram.dcl.dcl] */
>>
>> @@ -10112,12 +10112,14 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
>>  {
>>    cp_token *token;
>>    enum rid keyword;
>> +  common_token_info guard_tinfo;
>>
>>    if (if_p != NULL)
>>      *if_p = false;
>>
>>    /* Peek at the next token.  */
>>    token = cp_parser_require (parser, CPP_KEYWORD, RT_SELECT);
>> +  guard_tinfo = extract_token_info (token);
>>
>>    /* See what kind of keyword it is.  */
>>    keyword = token->keyword;
>> @@ -10160,19 +10162,8 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
>>           /* Parse the then-clause.  */
>>           in_statement = parser->in_statement;
>>           parser->in_statement |= IN_IF_STMT;
>> -         if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
>> -           {
>> -             location_t loc = cp_lexer_peek_token (parser->lexer)->location;
>> -             add_stmt (build_empty_stmt (loc));
>> -             cp_lexer_consume_token (parser->lexer);
>> -             if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_ELSE))
>> -               warning_at (loc, OPT_Wempty_body, "suggest braces around "
>> -                           "empty body in an %<if%> statement");
>> -             nested_if = false;
>> -           }
>> -         else
>> -           cp_parser_implicitly_scoped_statement (parser, &nested_if,
>> -                                                  token->location, "if");
>> +         cp_parser_implicitly_scoped_statement (parser, &nested_if,
>> +                                                guard_tinfo);
>>           parser->in_statement = in_statement;
>>
>>           finish_then_clause (statement);
>> @@ -10181,24 +10172,14 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
>>           if (cp_lexer_next_token_is_keyword (parser->lexer,
>>                                               RID_ELSE))
>>             {
>> +             guard_tinfo
>> +               = extract_token_info (cp_lexer_peek_token (parser->lexer));
>>               /* Consume the `else' keyword.  */
>> -             location_t else_tok_loc
>> -               = cp_lexer_consume_token (parser->lexer)->location;
>> +             cp_lexer_consume_token (parser->lexer);
>>               begin_else_clause (statement);
>>               /* Parse the else-clause.  */
>> -             if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
>> -               {
>> -                 location_t loc;
>> -                 loc = cp_lexer_peek_token (parser->lexer)->location;
>> -                 warning_at (loc,
>> -                             OPT_Wempty_body, "suggest braces around "
>> -                             "empty body in an %<else%> statement");
>> -                 add_stmt (build_empty_stmt (loc));
>> -                 cp_lexer_consume_token (parser->lexer);
>> -               }
>> -             else
>> -               cp_parser_implicitly_scoped_statement (parser, NULL,
>> -                                                      else_tok_loc, "else");
>> +             cp_parser_implicitly_scoped_statement (parser, NULL,
>> +                                                    guard_tinfo);
>>
>>               finish_else_clause (statement);
>>
>> @@ -10239,7 +10220,7 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
>>           parser->in_switch_statement_p = true;
>>           parser->in_statement |= IN_SWITCH_STMT;
>>           cp_parser_implicitly_scoped_statement (parser, NULL,
>> -                                                0, "switch");
>> +                                                guard_tinfo);
>>           parser->in_switch_statement_p = in_switch_statement_p;
>>           parser->in_statement = in_statement;
>>
>> @@ -10784,17 +10765,17 @@ static tree
>>  cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
>>  {
>>    cp_token *token;
>> -  location_t tok_loc;
>>    enum rid keyword;
>>    tree statement;
>>    unsigned char in_statement;
>> +  common_token_info guard_tinfo;
>>
>>    /* Peek at the next token.  */
>>    token = cp_parser_require (parser, CPP_KEYWORD, RT_INTERATION);
>>    if (!token)
>>      return error_mark_node;
>>
>> -  tok_loc = token->location;
>> +  guard_tinfo = extract_token_info (token);
>>
>>    /* Remember whether or not we are already within an iteration
>>       statement.  */
>> @@ -10819,7 +10800,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
>>       cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
>>       /* Parse the dependent statement.  */
>>       parser->in_statement = IN_ITERATION_STMT;
>> -     cp_parser_already_scoped_statement (parser, tok_loc, "while");
>> +     cp_parser_already_scoped_statement (parser, guard_tinfo);
>>       parser->in_statement = in_statement;
>>       /* We're done with the while-statement.  */
>>       finish_while_stmt (statement);
>> @@ -10834,7 +10815,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
>>       statement = begin_do_stmt ();
>>       /* Parse the body of the do-statement.  */
>>       parser->in_statement = IN_ITERATION_STMT;
>> -     cp_parser_implicitly_scoped_statement (parser, NULL, 0, "do");
>> +     cp_parser_implicitly_scoped_statement (parser, NULL, guard_tinfo);
>>       parser->in_statement = in_statement;
>>       finish_do_body (statement);
>>       /* Look for the `while' keyword.  */
>> @@ -10864,7 +10845,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
>>
>>       /* Parse the body of the for-statement.  */
>>       parser->in_statement = IN_ITERATION_STMT;
>> -     cp_parser_already_scoped_statement (parser, tok_loc, "for");
>> +     cp_parser_already_scoped_statement (parser, guard_tinfo);
>>       parser->in_statement = in_statement;
>>
>>       /* We're done with the for-statement.  */
>> @@ -11134,10 +11115,12 @@ cp_parser_declaration_statement (cp_parser* parser)
>>
>>  static tree
>>  cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p,
>> -                                    location_t guard_loc,
>> -                                    const char *guard_kind)
>> +                                    const common_token_info &guard_tinfo)
>>  {
>>    tree statement;
>> +  location_t body_loc = cp_lexer_peek_token (parser->lexer)->location;
>> +  common_token_info body_tinfo
>> +    = extract_token_info (cp_lexer_peek_token (parser->lexer));
>>
>>    if (if_p != NULL)
>>      *if_p = false;
>> @@ -11145,9 +11128,16 @@ cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p,
>>    /* Mark if () ; with a special NOP_EXPR.  */
>>    if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
>>      {
>> -      location_t loc = cp_lexer_peek_token (parser->lexer)->location;
>>        cp_lexer_consume_token (parser->lexer);
>> -      statement = add_stmt (build_empty_stmt (loc));
>> +      statement = add_stmt (build_empty_stmt (body_loc));
>> +
>> +      if (guard_tinfo.keyword == RID_IF
>> +       && !cp_lexer_next_token_is_keyword (parser->lexer, RID_ELSE))
>> +     warning_at (body_loc, OPT_Wempty_body,
>> +                 "suggest braces around empty body in an %<if%> statement");
>> +      else if (guard_tinfo.keyword == RID_ELSE)
>> +     warning_at (body_loc, OPT_Wempty_body,
>> +                 "suggest braces around empty body in an %<else%> statement");
>>      }
>>    /* if a compound is opened, we simply parse the statement directly.  */
>>    else if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
>> @@ -11158,20 +11148,15 @@ cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p,
>>        /* Create a compound-statement.  */
>>        statement = begin_compound_stmt (0);
>>        /* Parse the dependent-statement.  */
>> -      location_t body_loc = cp_lexer_peek_token (parser->lexer)->location;
>>        cp_parser_statement (parser, NULL_TREE, false, if_p);
>>        /* Finish the dummy compound-statement.  */
>>        finish_compound_stmt (statement);
>> -      cp_token *next_tok = cp_lexer_peek_token (parser->lexer);
>> -      if (next_tok->keyword != RID_ELSE)
>> -        {
>> -          location_t next_stmt_loc = next_tok->location;
>> -          warn_for_misleading_indentation (guard_loc, body_loc,
>> -                                           next_stmt_loc, next_tok->type,
>> -                                           guard_kind);
>> -        }
>>      }
>>
>> +  common_token_info next_tinfo
>> +    = extract_token_info (cp_lexer_peek_token (parser->lexer));
>> +  warn_for_misleading_indentation (guard_tinfo, body_tinfo, next_tinfo);
>> +
>>    /* Return the statement.  */
>>    return statement;
>>  }
>> @@ -11182,19 +11167,19 @@ cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p,
>>     scope.  */
>>
>>  static void
>> -cp_parser_already_scoped_statement (cp_parser* parser, location_t guard_loc,
>> -                                 const char *guard_kind)
>> +cp_parser_already_scoped_statement (cp_parser* parser,
>> +                                 const common_token_info &guard_tinfo)
>>  {
>>    /* If the token is a `{', then we must take special action.  */
>>    if (cp_lexer_next_token_is_not (parser->lexer, CPP_OPEN_BRACE))
>>      {
>> -      location_t body_loc = cp_lexer_peek_token (parser->lexer)->location;
>> +      common_token_info body_tinfo
>> +     = extract_token_info (cp_lexer_peek_token (parser->lexer));
>> +
>>        cp_parser_statement (parser, NULL_TREE, false, NULL);
>> -      cp_token *next_tok = cp_lexer_peek_token (parser->lexer);
>> -      location_t next_stmt_loc = next_tok->location;
>> -      warn_for_misleading_indentation (guard_loc, body_loc,
>> -                                       next_stmt_loc, next_tok->type,
>> -                                       guard_kind);
>> +      common_token_info next_tinfo
>> +     = extract_token_info (cp_lexer_peek_token (parser->lexer));
>> +      warn_for_misleading_indentation (guard_tinfo, body_tinfo, next_tinfo);
>>      }
>>    else
>>      {
>> diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
>> index 443e3dd..e8e04e3 100644
>> --- a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
>> +++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
>> @@ -691,3 +691,102 @@ fn_36 (void)
>>       }
>>       foo(6); /* We shouldn't warn here.  */
>>  }
>> +
>> +void
>> +fn_37 (void)
>> +{
>> +  int i;
>> +
>> +#define EMPTY
>> +#define FOR_EACH(VAR, START, STOP) for (VAR = START; VAR < STOP; VAR++)
>> +
>> +  while (flagA); /* { dg-message "3: ...this 'while' clause" } */
>> +    foo (0); /* { dg-warning "statement is indented as if" } */
>> +
>> +  if (flagA)
>> +    ;
>> +  else if (flagA); /* { dg-message "8: ...this 'if' clause" } */
>> +    foo (0); /* { dg-warning "statement is indented as if" } */
>> +
>> +  while (flagA)
>> +    ;
>> +    foo (0);
>> +
>> +  if (flagA)
>> +    ;
>> +  else if (flagA) /* { dg-message "8: ...this 'if' clause" } */
>> +    foo (0);
>> +    foo (0); /* { dg-warning "statement is indented as if" } */
>> +
>> +  if (flagA)
>> +    ;
>> +
>> +  if (flagB)
>> +    ;
>> +
>> +  if (flagB)
>> +    ;
>> +    {
>> +      foo (0);
>> +    }
>> +
>> +  if (flagB)
>> +    ;
>> +  else; foo (0); /* { dg-warning "statement is indented as if" } */
>> +
>> +  if (flagC); foo (2); /* { dg-warning "statement is indented as if" } */
>> +
>> +  if (flagA)
>> +    ; /* blah */ { /* { dg-warning "statement is indented as if" } */
>> +      foo (1);
>> +    }
>> +
>> +  if (flagB) ; /* { dg-message "3: ...this 'if' clause" } */
>> +    return; /* { dg-warning "statement is indented as if" } */
>> +
>> +  if (flagB) EMPTY; /* { dg-message "3: ...this 'if' clause" } */
>> +    foo (1); /* { dg-warning "statement is indented as if" } */
>> +
>> +  while (flagC);
>> +  foo (2);
>> +
>> +  for (i = 0; i < 10; i++); /* { dg-message "3: ...this 'for' clause" } */
>> +    foo (2); /* { dg-warning "statement is indented as if" } */
>> +
>> +  FOR_EACH (i, 0, 10);
>> +    foo (2); /* { dg-warning "statement is indented as if" } */
>> +
>> +  FOR_EACH (i, 0, 10);
>> +    { /* { dg-warning "statement is indented as if" } */
>> +      foo (3);
>> +    }
>> +
>> +  FOR_EACH (i, 0, 10);
>> +  { /* { dg-warning "statement is indented as if" } */
>> +    foo (3);
>> +  }
>> +
>> +  if (flagA)
>> +    while (flagC++);
>> +  else
>> +    ;
>> +
>> +  if (flagA)
>> +    if (flagB)
>> +      foo (0);
>> +    else
>> +      foo (0);
>> +  else
>> +    foo (0);
>> +
>> +  while (i++); { /* { dg-warning "statement is indented as if" } */
>> +    foo (3);
>> +  }
>> +
>> +  if (i++); { /* { dg-warning "statement is indented as if" } */
>> +    foo (3);
>> +  }
>> +
>> +#undef EMPTY
>> +#undef FOR_EACH
>> +}
>
> Would it be good to separate out these into two categories (e.g. as two
> functions): misleading indentation vs merely bad indentation, and have a
> comment covering the latter?  (I like to have comments in my testcases,
> though I may be in the minority here)

Sure.

>
> Thanks; hope the above is reasonable.
> Dave
>