[RFC stage 1] Proposed new warning: -Wmisleading-indentation

Manuel López-Ibáñez lopezibanez@gmail.com
Thu Apr 16 18:29:00 GMT 2015


On 16/04/15 17:01, David Malcolm wrote:
> Attached is a work-in-progress patch for a new
>    -Wmisleading-indentation
> warning I've been experimenting with, for GCC 6.

It sounds very cool...

> (D) tabs vs spaces.  This is probably the biggest can of worms.

I would suggest to be very conservative when warning. In case of doubt, do not 
warn.

> An issue here is how to determine (i), or if it's OK to default to 8 and
> have a command-line option (param?) to override it? (though what about,
> say, each header file?)

In case of a potential warning, you could detect if there is a mix of tab and 
spaces for the lines affected that may make the warning bogus, and not warn in 
that case. That is,

<6 spaces>if (flagC)
<1 tab><space>foo ();
<1 tab><space>bar ();

warns but

<6 spaces>if (flagC)
<8 spaces>foo ();
<2   tabs>bar ();

does not because you don't know how many spaces those 2 tabs take.

Also, in order to detect whether something is a tab or a space, why not simply 
re-open and seek? The input file is probably already in memory. See how 
diagnostic_show_locus in diagnostics.c gets lines from given locations.

Not even Emacs, which is a GNU package, can get tabs vs. spaces right and 
compile-goto-error goes to the wrong column. That seems a can of worms not 
worth opening.


+
+   This is fed three things by the frontend:
+
+   (A) a series of location_t by the frontend's tokenizer,
+   corresponding to the locations of the token stream.  It uses
+   this to model a stack of "visual blocks", corresponding to
+   indentation levels within the source code.

I don't understand why you need more than three, one for the conditional 
statement, another for the solo statement and another for the next-statement. 
I also don't get why you need to pass the whole block and new_stmt to 
visual_block, don't the locations returned by EXPR_LOCATION suffice?

Given that, the C++ tentative parsing should not be a problem since there is no 
tentative parsing (or there should not be) of if/else/while, and you only care 
about the first token when parsing the other two statements.

@@ -236,6 +236,9 @@ c_lex_one_token (c_parser *parser, c_token *token)
    token->keyword = RID_MAX;
    token->pragma_kind = PRAGMA_NONE;

+  if (vis_parser)
+    vis_parser->on_token (token->location);
+

Why you need to save every location? Why not just do this when the parser 
detects an if/else/while? Then,

@@ -5081,7 +5086,11 @@ c_parser_if_body (c_parser *parser, bool *if_p)
    else if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
      add_stmt (c_parser_compound_statement (parser));
    else
-    c_parser_statement_after_labels (parser);
+    {
+      c_parser_statement_after_labels (parser);
+      if (vis_parser)
+	vis_parser->on_solo_stmt (block, if_loc);
+    }
    return c_end_compound_stmt (body_loc, block, flag_isoc99);
  }

You can simply here peek the location of the next token before and after 
c_parser_statement_after_labels (parser). Then warn if they are equal or if 
they are on the same line, no? You may need to skip pragmas.


+   Note that we can't simply use statement locations; for example, in:
+     if (flag)
+       x = 1;
+   the "if (flag)"'s location is at the open-paren, and that of the
+   assignment "x = 1;" is at the equals-sign, so any attempt to use
+   statement locations can be fooled by varying the spacing:

Aren't these bugs (or missing features) in the FE locations? I know that the 
location of the assignment is at '=' because we don't have (yet) locations for 
constants or variables (all binary expressions should have 3 locations!).

In any case, don't you have the location of the token that starts the 
statements? This is all you need as far as I understand.

And why the 'if' points to the open paren? One can easily find the open-paren 
location if needed by seeking in the input buffer.


+	  /* FIXME: what about entirely empty lines???
+	     Presumably they simply don't get tokens.  */

Empty lines no, but preprocessor directives do.

Cheers,

Manuel.



More information about the Gcc-patches mailing list