[gfortran,patch] Handle "%3$d"-style format specifiers in error.c

Brooks Moses bmoses@stanford.edu
Wed Nov 22 04:50:00 GMT 2006


(I apologize for the lack of threading on this message; I'm away from my
main computer and thus can only read the group via the web archives.  
Sending me a cc of replies would be appreciated!)

A few comments on this patch:

> +  struct
> +  {
> +    int type;
> +    int pos;
> +    union
> +    {
> +      locus * locval;
> +      int intval;
> +      char charval;
> +      char * stringval;
> +    } u;
> +  } percent[MAX_ARGS], arg[MAX_ARGS];

I like the use of the struct here; this seems a bit more elegant than
the old way of using separate arrays for the argument types.  However, I
would object to the array called "percent", on grounds that the name
doesn't mean anything to me as I'm reading this.

Also, I would expect arg[] to be the array of arguments in the order
that they're specified in the function call, but it appears that it is
the list in the order they appear in the format statement instead, so
that should probably get a clearer name too.  Or at least an explanatory
comment.

> +      if (ISDIGIT (*format))
>  	{
> -	  c = *format++;
> +	  /* This is a position specifier.  */

I think I'd prefer a slightly longer comment there (with an example,
maybe -- e.g. "For example, '%3$d' specifies the third argument,
formatted in %d format.")  And referencing "man 3 printf" is, I think,
definitely a useful thing to include.  Now that I've puzzled through the
patch, "position specifier" has an obvious meaning, but it didn't when I
first started looking at this!

> +      if (use_dollars == -1)
> +	  use_dollars = 0;
> +      if (!use_dollars)
> +	  pos++;

I'm having a hard time following the uses of the use_dollars logic,
particularly since it's got a binary name but has three values (+1, -1,
and 0) in various points in the code.  What does it achieve that would
not be achieved with a construct along the lines of 

  if (ISDIGIT (*format))
    {
      ...existing code....
    }
  else
    pos++

and leaving the use_dollars variable out entirely?  It seems that there
would only be a change to the behavior when a format item without a
position specifier follows one that has a position specifier, but
according to the printf manpage that produces "indeterminate" results
and so shouldn't matter.

(Perhaps the gcc_assert(use_dollars != 0) was intended as a check to
ensure that the with-dollarsign and without-dollarsign forms are not
interspersed?  But it doesn't do that; it only catches the case where
the first format item omits a position specifier and a later one
includes one.  If that's the intent, I still think it's clearer to only
use it for checking purposes rather than for program logic, though.)

> -	    case '\0':
> -	      format--;
> -	      break;
[...]
> +	  default:
> +	    gcc_unreachable ();

That looks like a good replacement of the case '\0' code I added to
avoid buffer overflows.  But the case '\0' code is retained in the
second switch statement later.  Perhaps you could replace it there as
well?  (Or just eliminate it, since this check will now error out on the
first pass, and we'll never get to the second pass.)

> +      if (ISDIGIT(*format))
> +	{
> +	  while (ISDIGIT(*format))
> +	    format++;
> +	  format++;
> +	}

Personally, I'd like to see a /* Skip over the '$' */ comment
immediately before that second format++ there, so it's obvious why it's
there, but I think my comment preferences are a bit heavier than what's
common in GCC code.  :)

> +	  case TYPE_CURRENTLOC:
> +	    percent[pos].u.locval = &gfc_current_locus;
> +	    break;
> +
> +	  case TYPE_LOCUS:
> +	    percent[pos].u.locval = va_arg (argp, locus *);
> +	    break;
[...]
> +      if (percent[pos].type == TYPE_CURRENTLOC
> +	  || percent[pos].type == TYPE_LOCUS)
> +	{
> +	  if (have_l1)
> +	    l2 = percent[pos].u.locval;
> +	  else
> +	    {
> +	      l1 = percent[pos].u.locval;
> +	      have_l1 = 1;
>  	    }
>  	}

I admit that I don't see the advantage of splitting this out and storing
the locus pointer in the union.  How about the following instead, which 
is a bit shorter?

       case TYPE_LOCUS:
         loc = va_arg (argp, locus *);   
         /* Fall through */
       case TYPE_CURRENTLOC:
         if (percent[pos].type == TYPE_CURRENTLOC)
           loc = &gfc_current_locus;
         if (have_l1)
           l2 = percent[pos].u.locval;
         else
           {
             l1 = percent[pos].u.locval;
             have_l1 = 1;
           }
         break;

Or, since we are doing the l1/l2 switching logic here anyway, and have a
handy string pointer to put things in, how about this?

       case TYPE_LOCUS:
         loc = va_arg (argp, locus *);   
         /* Fall through */
       case TYPE_CURRENTLOC:
         if (percent[pos].type == TYPE_CURRENTLOC)
           loc = &gfc_current_locus;
         if (have_l1)
           {
             l2 = loc;
             percent[pos].u.stringval = "(2)"
         else
           {
             l1 = loc;
             have_l1 = 1;
             percent[pos].u.stringval = "(1)"
           }
         break;

Then, in the second switch statement, the 'C' and 'L' cases can just
fallthrough to the 's' case, and there's no need for repeating the
have_l1 logic trick.

Other than that, it looks good to me.

(Actually, to be honest, I think it looks far more complicated than an
error-printing routine ought to have to be, but I can't see any way to
make it any simpler with the calling syntax we're stuck with, so I blame
the syntax and not the implementer! :) )

- Brooks



More information about the Gcc-patches mailing list