This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] support for -fdump-ada-spec


On Mon, 26 Apr 2010, Arnaud Charlet wrote:

> 2010-04-26  Arnaud Charlet  <charlet@adacore.com>
>             Matthew Gingell  <gingell@adacore.com>
> 
> 	* tree-dump.c (dump_files): Add ada-spec.
> 	(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
> 	* tree-pass.h (tree_dump_index): Add TDI_ada.
> 	* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
> 	(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
> 	(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
> 	* c-decl.c: Include c-ada-spec.h.
> 	(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
> 	functions.
> 	(c_write_global_declarations): Add handling of -fdump-ada-spec.
> 	* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
> 	* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
> 	* c-ada-spec.h, c-ada-spec.c: New files.

The main manual (invoke.texi) needs to document the new option with a 
cross-reference to the Ada manual.

As I understand it, this feature

(a) is deliberately limited to a subset of C and C++ declarations and 
macros;

(b) may, for some input, generate invalid Ada output rather than a 
diagnostic.

If this is as intended it doesn't itself block the feature, but it would 
seem like a good idea to have documentation of what is / is not supported.

The driver changes are OK once the rest of the changes are OK.  The 
following is not an exhaustive review of the C changes.

It looks like you will always output extended identifiers as UTF-8.  Is 
this a form that will always be acceptable as input to the Ada compiler?

> +static void
> +print_ada_macros (pretty_printer *pp, cpp_hashnode **macros, int max_ada_macros)
> +{
> +  int j, num_macros = 0, prev_line = -1;
> +
> +  for (j = 0; j < max_ada_macros; j++)
> +    {
> +      cpp_hashnode *node = macros [j];
> +      const cpp_macro *macro = node->value.macro;
> +      unsigned i;
> +      unsigned char s [4096], params [4096];
> +      unsigned char *buffer = s, *buf_param = params, *prev;

I can see no checks that you don't overflow these fixed-width buffers.  
Buffer overruns are never OK; you should preferably calculate the memory 
needed and always allocate enough (that might need two passes, one to 
calculate and one to write the output after allocation), but failing that 
make sure to avoid buffer overruns.

> +	  if (!is_string)
> +	    switch (*prev)
> +	      {

What if there are both string and non-string tokens in the macro 
expansion?  And will the string code handle string concatenation?

> +		case '<':
> +		  if (prev + 2 == buffer && prev [1] == '<')
> +		    {
> +		      if (prev [-1] == '1')
> +			prev [-1] = '2';
> +		      else if (prev [-1] == ' '
> +			       && prev [-2] == '1')
> +			prev [-2] = '2';
> +
> +		      prev [0] = '*';
> +		      prev [1] = '*';
> +		    }

Quite what is this code meant to be doing?

> +		case '0':
> +		  if (prev + 1 < buffer && prev [1] == 'x')
> +		    {
> +		      /* convert 0x.... into 16#....# */

Don't you need to handle 0X the same as 0x, and handle initial 0, not 0x 
or 0X, on integer but not floating-point constants, as octal?  (Then there 
are GNU C binary constants starting 0b.)

Hex floats would also need appropriate handling of the exponent.  Floating 
constants in general can have 'f' suffixes in C.

> +		      if (*buffer != 'L' && *buffer != 'l' && *buffer != 'U')
> +			buffer++;

> +		  if (buffer [-1] == 'L'
> +		      || buffer [-1] == 'l'
> +		      || buffer [-1] == 'U')

In both these cases, you can have suffixes such as "UL" or "LLU".

In essence, the point with all these cases is that what you are doing with 
taking a C string representation of the macro expansion and trying to 
convert it to an Ada string seems very fragile.  It would be better to 
take the actual tokens, interpret their semantics and produce equivalent 
Ada tokens - call cpp_classify_number then the various subsequent 
functions the C lexer uses to generate trees for the constants, then write 
out an Ada constant with the same value.

Much the same applies to other types of tokens.  If you work on the 
preprocessing tokens rather than a string, new forms of constants don't 
cause problems; nor do C escape sequences in strings (I see nothing in 
this code to handle them); nor do digraphs and C++ alternate spellings.  I 
don't know if there are cases where C and Ada operator precedence differ - 
if there are, those could still cause issues in principle - but using 
preprocessing tokens for their semantics rather than their spelling seems 
generally likely to produce better results on a wider range of input, as 
well as more reliably detecting cases that you can't convert to Ada.

> +      /* Special case null for now, need to find a more generic approach */
> +      if (!strcasecmp ((const char *)NODE_NAME (node), "null"))
> +	supported = 0;

You have an array ada_reserved later in this code....

(Also, note formatting.  "." and two spaces at end of comment, space after 
(type name) in cast.)

> +  macros = (cpp_hashnode **) alloca (sizeof (cpp_hashnode *) * max_ada_macros);

There are lots of casted allocations in this patch that would better be 
using the macros from libiberty.h such as XALLOCAVEC (in this case).

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]