This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] (v3) On-demand locations within string-literals


On Thu, 2016-08-04 at 11:37 -0600, Jeff Law wrote:
> On 08/03/2016 09:45 AM, David Malcolm wrote:
> > Changes in v3:
> > - Avoid including cpplib.h from input.h
> > - Properly handle stringified macro arguments (with tests for this)
> > - Minor whitespace fixes
> > - Move selftest.h changes to a separate patch
> > 
> > Changes in v2:
> > - Tweaks to substring location selftests
> > - Many more selftests (EBCDIC, the various wide string types, etc)
> > - Clean up conditions in charset.c; require source == execution
> > charset
> >   to have substring locations
> > - Make string_concat_db field private
> > - Return error messages rather than bool
> > - Fix source_range for charset.c:convert_escape
> > - Introduce class substring_loc
> > - Handle bad input locations more gracefully
> > - Ensure that we can read substring information for a token which
> >   starts in one linemap and ends in another (seen in
> >   gcc.dg/cpp/pr69985.c)
> > 
> > This version addresses Joseph's qn about stringification of macro
> > arguments (by failing gracefully on them), and the modularity
> > concerns noted by Manu.
> > 
> > Successfully bootstrapped&regrtested in conjunction with the rest
> > of the
> > patch kit on x86_64-pc-linux-gnu.
> > 
> > v2 of the kit successfully passes a full config-list.mk and a
> > successful selftest
> > run for stage 1 on powerpc-ibm-aix7.1.3.0 (gcc111), both in
> > conjunction with the
> > rest of the patch kit; I plan to repeat those tests.
> > 
> > I believe I can self-approve the changes to input.c, input.h,
> > libcpp,
> > and the testsuite; the remaining changes needing approval are those
> > to c-family and to gcc.c.
> I think that's a fair assessment.  You might consider pulling those
> out 
> as a distinct hunk in the future -- if you haven't noticed, I often
> try 
> to knock out the smaller patches first (without even looking to see
> how 
> much might be bits the author can self-approve).
> 
> 
> > 
> > OK for trunk if it passes testing? (by itself)
> > 
> > 
> > gcc/c-family/ChangeLog:
> > 	* c-common.c: Include "substring-locations.h".
> > 	(get_cpp_ttype_from_string_type): New function.
> > 	(g_string_concat_db): New global.
> > 	(substring_loc::get_range): New method.
> > 	* c-common.h (g_string_concat_db): New declaration.
> > 	(class substring_loc): New class.
> > 	* c-lex.c (lex_string): When concatenating strings, capture the
> > 	locations of all tokens using a new obstack, and record the
> > 	concatenation locations within g_string_concat_db.
> > 	* c-opts.c (c_common_init_options): Construct
> > g_string_concat_db
> > 	on the ggc-heap.
> > 
> > gcc/ChangeLog:
> > 	* gcc.c (cpp_options): Rename string to...
> > 	(cpp_options_): ...this, to avoid clashing with struct in
> > 	cpplib.h.
> > 	(static_specs): Update initialize for above renaming
> > 	* input.c (string_concat::string_concat): New constructor.
> > 	(string_concat_db::string_concat_db): New constructor.
> > 	(string_concat_db::record_string_concatenation): New method.
> > 	(string_concat_db::get_string_concatenation): New method.
> > 	(string_concat_db::get_key_loc): New method.
> > 	(class auto_cpp_string_vec): New class.
> > 	(get_substring_ranges_for_loc): New function.
> > 	(get_source_range_for_substring): New function.
> > 	(get_num_source_ranges_for_substring): New function.
> > 	(class selftest::lexer_test_options): New class.
> > 	(struct selftest::lexer_test): New struct.
> > 	(class selftest::ebcdic_execution_charset): New class.
> > 	(selftest::ebcdic_execution_charset::s_singleton): New
> > variable.
> > 	(selftest::lexer_test::lexer_test): New constructor.
> > 	(selftest::lexer_test::~lexer_test): New destructor.
> > 	(selftest::lexer_test::get_token): New method.
> > 	(selftest::assert_char_at_range): New function.
> > 	(ASSERT_CHAR_AT_RANGE): New macro.
> > 	(selftest::assert_num_substring_ranges): New function.
> > 	(ASSERT_NUM_SUBSTRING_RANGES): New macro.
> > 	(selftest::assert_has_no_substring_ranges): New function.
> > 	(ASSERT_HAS_NO_SUBSTRING_RANGES): New macro.
> > 	(selftest::test_lexer_string_locations_simple): New function.
> > 	(selftest::test_lexer_string_locations_ebcdic): New function.
> > 	(selftest::test_lexer_string_locations_hex): New function.
> > 	(selftest::test_lexer_string_locations_oct): New function.
> > 	(selftest::test_lexer_string_locations_letter_escape_1): New
> > function.
> > 	(selftest::test_lexer_string_locations_letter_escape_2): New
> > function.
> > 	(selftest::test_lexer_string_locations_ucn4): New function.
> > 	(selftest::test_lexer_string_locations_ucn8): New function.
> > 	(selftest::uint32_from_big_endian): New function.
> > 	(selftest::test_lexer_string_locations_wide_string): New
> > function.
> > 	(selftest::uint16_from_big_endian): New function.
> > 	(selftest::test_lexer_string_locations_string16): New function.
> > 	(selftest::test_lexer_string_locations_string32): New function.
> > 	(selftest::test_lexer_string_locations_u8): New function.
> > 	(selftest::test_lexer_string_locations_utf8_source): New
> > function.
> > 	(selftest::test_lexer_string_locations_concatenation_1): New
> > 	function.
> > 	(selftest::test_lexer_string_locations_concatenation_2): New
> > 	function.
> > 	(selftest::test_lexer_string_locations_concatenation_3): New
> > 	function.
> > 	(selftest::test_lexer_string_locations_macro): New function.
> > 	(selftest::test_lexer_string_locations_stringified_macro_argume
> > nt):
> > 	New function.
> > 	(selftest::test_lexer_string_locations_non_string): New
> > function.
> > 	(selftest::test_lexer_string_locations_long_line): New
> > function.
> > 	(selftest::test_lexer_char_constants): New function.
> > 	(selftest::input_c_tests): Call the new test functions once per
> > 	case within the line_table test matrix.
> > 	* input.h (struct string_concat): New struct.
> > 	(struct location_hash): New struct.
> > 	(class string_concat_db): New class.
> > 	* substring-locations.h: New header.
> > 
> > gcc/testsuite/ChangeLog:
> > 	* gcc.dg/plugin/diagnostic-test-string-literals-1.c: New file.
> > 	* gcc.dg/plugin/diagnostic-test-string-literals-2.c: New file.
> > 	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: New
> > file.
> > 	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above
> > new files.
> > 
> > libcpp/ChangeLog:
> > 	* charset.c (cpp_substring_ranges::cpp_substring_ranges): New
> > 	constructor.
> > 	(cpp_substring_ranges::~cpp_substring_ranges): New destructor.
> > 	(cpp_substring_ranges::add_range): New method.
> > 	(cpp_substring_ranges::add_n_ranges): New method.
> > 	(_cpp_valid_ucn): Add "char_range" and "loc_reader" params; if
> > 	they are non-NULL, read position information from *loc_reader
> > 	and update char_range->m_finish accordingly.
> > 	(convert_ucn): Add "char_range", "loc_reader", and "ranges"
> > 	params.  If loc_reader is non-NULL, read location information
> > from
> > 	it, and update *ranges accordingly, using char_range.
> > 	Conditionalize the conversion into tbuf on tbuf being non-NULL.
> > 	(convert_hex): Likewise, conditionalizing the call to
> > 	emit_numeric_escape on tbuf.
> > 	(convert_oct): Likewise.
> > 	(convert_escape): Add params "loc_reader" and "ranges".  If
> > 	loc_reader is non-NULL, read location information from it, and
> > 	update *ranges accordingly.  Conditionalize the conversion into
> > 	tbuf on tbuf being non-NULL.
> > 	(cpp_interpret_string): Rename to...
> > 	(cpp_interpret_string_1): ...this, adding params "loc_readers"
> > and
> > 	"out".  Use "to" to conditionalize the initialization and usage
> > of
> > 	"tbuf", such as running the converter.  If "loc_readers" is
> > 	non-NULL, use the instances within it, reading location
> > 	information from them, and passing them to convert_escape;
> > likewise
> > 	write to "out" if loc_readers is non-NULL.  Check for leading
> > 	quote and issue an error if it is not present.  Update boundary
> > 	check from "== limit" to ">= limit" to protect against
> > erroneous
> > 	location values to calls that are not parsing string literals.
> > 	(cpp_interpret_string): Reimplement in terms to
> > 	cpp_interpret_string_1.
> > 	(noop_error_cb): New function.
> > 	(cpp_interpret_string_ranges): New function.
> > 	(cpp_string_location_reader::cpp_string_location_reader): New
> > 	constructor.
> > 	(cpp_string_location_reader::get_next): New method.
> > 	* include/cpplib.h (class cpp_string_location_reader): New
> > class.
> > 	(class cpp_substring_ranges): New class.
> > 	(cpp_interpret_string_ranges): New prototype.
> > 	* internal.h (_cpp_valid_ucn): Add params "char_range" and
> > 	"loc_reader".
> > 	* lex.c (forms_identifier_p): Pass NULL for new params to
> > 	_cpp_valid_ucn.
> > ---
> > 
> > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> > index 27031b5..7a8b6ea 100644
> > --- a/gcc/c-family/c-common.c
> > +++ b/gcc/c-family/c-common.c
> > @@ -1098,6 +1099,67 @@ fix_string_type (tree value)
> >    TREE_STATIC (value) = 1;
> >    return value;
> >  }
> > +
> > +/* Given a string of type STRING_TYPE, determine what kind of
> > string
> > +   token created it: CPP_STRING, CPP_STRING16, CPP_STRING32, or
> > +   CPP_WSTRING.  Return CPP_OTHER in case of error.
> > +
> > +   This effectively reverses part of the logic in
> > +   lex_string and fix_string_type.  */
> > +
> > +static enum cpp_ttype
> > +get_cpp_ttype_from_string_type (tree string_type)
> > +{
> > +  gcc_assert (string_type);
> > +  if (TREE_CODE (string_type) != ARRAY_TYPE)
> > +    return CPP_OTHER;
> > +
> > +  tree element_type = TREE_TYPE (string_type);
> > +  if (TREE_CODE (element_type) != INTEGER_TYPE)
> > +    return CPP_OTHER;
> > +
> > +  int bits_per_character = TYPE_PRECISION (element_type);
> > +  switch (bits_per_character)
> > +    {
> > +    case 8:
> > +      return CPP_STRING;  /* It could have also been
> > CPP_UTF8STRING.  */
> > +    case 16:
> > +      return CPP_STRING16;
> > +    case 32:
> > +      return CPP_STRING32;
> > +    }
> > +
> > +  if (bits_per_character == TYPE_PRECISION (wchar_type_node))
> > +    return CPP_WSTRING;
> Doesn't the switch above effectively mean we don't use CPP_WSTRING? 
>  In 
> what cases do you expect it to be used?

I was attempting to provide an inverse of lex_string and
fix_string_type, going back from a tree type to a cpp_ttype.  The
purpose of the ttype is to determine the execution charset of a
STRING_CST to enforce the requirement in cpp_interpret_string_ranges
that there's a 1:1 correspondence between bytes in the source encoding
and bytes in the execution encoding.

c-lex.c: lex_string has:
	case CPP_WSTRING:
	  value = build_string (TYPE_PRECISION (wchar_type_node)
				/ TYPE_PRECISION (char_type_node),
				"\0\0\0");  /* widest supported wchar_t
					       is 32 bits */

Given that, it looks like it's not possible for that conditional to
fire (unless we somehow have a 24-bit wchar_t???)

Should I just drop the CPP_WSTRING conditional?  (and update the
function comment, to capture the fact that the cpp_ttype is one with
the same execution encoding as the STRING_CST, not necessarily equal to
the exact cpp_ttype that was in use).

> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index 8c80574..7b5da57 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch
> > (cpp_reader *pfile);
> >     __TIME__ can store.  */
> >  #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799)
> > 
> > +extern GTY(()) string_concat_db *g_string_concat_db;
> Presumably this DB needs to persist through the entire compilation
> unit 
> and the nodes inside reference GC'd objects, right?  Just want to
> make 
> 100% sure that we need to expose this to the GC system before ack
> -ing.

It needs to persist for as long as we might make queries about
substring locations.  It doesn't reference GC'd objects.  However it
might be desirable to locate locations within string constants loaded
from PCH files, hence I put it in GTY memory.  Is this overkill?

> The rest looks reasonable.  So we just need to reach closure on those
> two issues IMHO.
> 
> jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]