This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Status of rich location work (was Re: [PATCH 06/10] Track expression ranges in C frontend)


On Fri, 2015-10-30 at 00:15 -0600, Jeff Law wrote:
> On 10/23/2015 02:41 PM, David Malcolm wrote:
> > As in the previous version of this patch
> >   "Implement tree expression tracking in C FE (v2)"
> > the patch now captures ranges for all C expressions during parsing within
> > a new field of c_expr, and for all tree nodes with a location_t, it stores
> > them in ad-hoc locations for later use.
> >
> > Hence compound expressions get ranges; see:
> >    https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-expressions-1.html
> >
> > and for this example:
> >
> >    int test (int foo)
> >    {
> >      return foo * 100;
> >             ^^^   ^^^
> >    }
> >
> > we have access to the ranges of "foo" and "100" during C parsing via
> > the c_expr, but once we have GENERIC, all we have is a VAR_DECL and an
> > INTEGER_CST (the former's location is in at the top of the
> > function, and the latter has no location).
> >
> > gcc/ChangeLog:
> > 	* Makefile.in (OBJS): Add gcc-rich-location.o.
> > 	* gcc-rich-location.c: New file.
> > 	* gcc-rich-location.h: New file.
> > 	* print-tree.c (print_node): Print any source range information.
> > 	* tree.c (set_source_range): New functions.
> > 	* tree.h (CAN_HAVE_RANGE_P): New.
> > 	(EXPR_LOCATION_RANGE): New.
> > 	(EXPR_HAS_RANGE): New.
> > 	(get_expr_source_range): New inline function.
> > 	(DECL_LOCATION_RANGE): New.
> > 	(set_source_range): New decls.
> > 	(get_decl_source_range): New inline function.
> >
> > gcc/c-family/ChangeLog:
> > 	* c-common.c (c_fully_fold_internal): Capture existing souce_range,
> > 	and store it on the result.
> >
> > gcc/c/ChangeLog:
> > 	* c-parser.c (set_c_expr_source_range): New functions.
> > 	(c_token::get_range): New method.
> > 	(c_token::get_finish): New method.
> > 	(c_parser_expr_no_commas): Call set_c_expr_source_range on the ret
> > 	based on the range from the start of the LHS to the end of the
> > 	RHS.
> > 	(c_parser_conditional_expression): Likewise, based on the range
> > 	from the start of the cond.value to the end of exp2.value.
> > 	(c_parser_binary_expression): Call set_c_expr_source_range on
> > 	the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR.
> > 	(c_parser_cast_expression): Call set_c_expr_source_range on ret
> > 	based on the cast_loc through to the end of the expr.
> > 	(c_parser_unary_expression): Likewise, based on the
> > 	op_loc through to the end of op.
> > 	(c_parser_sizeof_expression) Likewise, based on the start of the
> > 	sizeof token through to either the closing paren or the end of
> > 	expr.
> > 	(c_parser_postfix_expression): Likewise, using the token range,
> > 	or from the open paren through to the close paren for
> > 	parenthesized expressions.
> > 	(c_parser_postfix_expression_after_primary): Likewise, for
> > 	various kinds of expression.
> > 	* c-tree.h (struct c_expr): Add field "src_range".
> > 	(c_expr::get_start): New method.
> > 	(c_expr::get_finish): New method.
> > 	(set_c_expr_source_range): New decls.
> > 	* c-typeck.c (parser_build_unary_op): Call set_c_expr_source_range
> > 	on ret for prefix unary ops.
> > 	(parser_build_binary_op): Likewise, running from the start of
> > 	arg1.value through to the end of arg2.value.
> >
> > gcc/testsuite/ChangeLog:
> > 	* gcc.dg/plugin/diagnostic-test-expressions-1.c: New file.
> > 	* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
> > 	New file.
> > 	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
> > 	diagnostic_plugin_test_tree_expression_range.c and
> > 	diagnostic-test-expressions-1.c.
> 
> >   /* Initialization routine for this file.  */
> >
> > @@ -6085,6 +6112,9 @@ c_parser_expr_no_commas (c_parser *parser, struct c_expr *after,
> >     ret.value = build_modify_expr (op_location, lhs.value, lhs.original_type,
> >   				 code, exp_location, rhs.value,
> >   				 rhs.original_type);
> > +  set_c_expr_source_range (&ret,
> > +			   lhs.get_start (),
> > +			   rhs.get_finish ());
> One line if it fits.
> 
> 
> > @@ -6198,6 +6232,9 @@ c_parser_conditional_expression (c_parser *parser, struct c_expr *after,
> >   			   ? t1
> >   			   : NULL);
> >       }
> > +  set_c_expr_source_range (&ret,
> > +			   start,
> > +			   exp2.get_finish ());
> Here too.
> 
> > @@ -6522,6 +6564,10 @@ c_parser_cast_expression (c_parser *parser, struct c_expr *after)
> >   	expr = convert_lvalue_to_rvalue (expr_loc, expr, true, true);
> >         }
> >         ret.value = c_cast_expr (cast_loc, type_name, expr.value);
> > +      if (ret.value && expr.value)
> > +	set_c_expr_source_range (&ret,
> > +				 cast_loc,
> > +				 expr.get_finish ());
> And here?
> 
> With the nits fixed, this is OK.
> 
> I think that covers this iteration of the rich location work and that 
> you'll continue working with Jason on extending this into the C++ front-end.

Here's a summary of the current status of this work [1]:

Patches 1-4 of the kit: these Jeff has approved, with some pre-approved
nit fixes in 4.  I see these as relatively low risk, and plan to commit
these today/tomorrow.

Patches 5-10: Jeff approved these also (again with some nits). These
feel higher-risk to me, owing to the potential for performance
regressions; I haven't yet answered at least one of Richi's performance
questions (impact on time taken to generate the C++ PCH file); the last
performance testing I did can be seen here:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02283.html
where the right-most column is this kit.

CCing Richi to keep him in the loop for the above.  Richi, is there any
other specific testing you'd want me to do for this?
Or is it OK to commit, and to see what impact it has on your daily
performance testing?  (and to revert if the impact is unacceptable).

Talking about risks: the reduction of the space for ordinary maps by a
factor of 32, by taking up 5 bits for the packed range information
optimization (patch 10):
 https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02539.html
CCing Dodji: Dodji; is this reasonable?

I did some experiments back in July on this; instrumented builds of
openldap and openssl to see how much space we have in location_t:
https://dmalcolm.fedorapeople.org/gcc/2015-07-23/openldap.csv
https://dmalcolm.fedorapeople.org/gcc/2015-07-24/openldap.csv

(these are space-separated:
           SRPM name
           sourcefile
           maximal ordinary location_t
           minimal macro location_t)

openldap build #files: 906
maximal ordinary location_t was:
sourcefile='/builddir/build/BUILD/openldap-2.4.40/openldap-2.4.40/servers/slapd/bconfig.c'
          max_ordinary_location=0x0081bd1b
          (and min_macro_location=0x7ffe5903
minimal macro location_t was:
sourcefile='/builddir/build/BUILD/openldap-2.4.40/openldap-2.4.40/servers/slapd/aclparse.c'
          min_macro_location=0x7ffe57e2
          (with max_ordinary_location=0x00719775)

openssl-1.0.1k-8.fc22.src.rpm.x86_64:
      #files: 1495
max_ordinary_location=0x00be3726
 (openssl-1.0.1k/apps/s_client.c)
 with min_macro_location=0x7ffe7b6b

min_macro_location=0x7ffdf069 
 (openssl-1.0.1k/apps/speed.c)
 with max_ordinary_location=0x00a1abdf

In all of the above cases, we had enough room to do the bit-packing
optimization, but this is just two projects (albeit real-world C code).

Comparing the gap between maximal ordinary map location and minimal
macro map location, and seeing how much we can expand the ordinary map
locations, the openldap build had:
  (0x7ffe57e2 - 0x0081bd1b) / 0x0081bd1b  == factor of 251 i.e.
7 bits of space available

openssl build had:
  (0x7ffdf069 - 0x00be3726) / 0x00be3726  == factor of 171 i.e. 7 bits
of space available

hence allocating 5 bits to packing ranges is (I hope) reasonable.


Jeff: I'm working on expression ranges in the C++ FE; is that a
prerequisite for patches 5-10, or can 5-10 go ahead without the C++
work?  (assuming the other issues above are acceptable).

Hope this all makes sense and sounds sane
Dave

[1] Together the kit gives us:
* patch 4: infrastructure for printing underlines in
diagnostic_show_locus and for multiple ranges
* patches 5-10: the "source_location" (aka location_t) type becomes a
caret plus a range; the tokens coming from libcpp gain ranges, so
everything using libcpp gains at least underlines of tokens; the C
frontend generates sane ranges for expressions as it parses them, better
showing the user how the parser "sees" their code.

Hence we ought to get underlined ranges for many diagnostics in C and C
++ with this (e.g. input_location gives an underline covering the range
of the token starting at the caret).  The "caret" should remain
unchanged from the status quo, so e.g. debugging locations shouldn't be
affected by the addition of ranges.

I'm anticipating that we'd need some followup patches to pick better
ranges for some diagnostics, analogous to the way we convert "warning"
to "warning_at" for where input_location isn't the best location; I'd
expect these followup patches to be relative simple and low-risk.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]