This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]: designing customizable format attributes


"Kaveh R. Ghazi" <ghazi@caipclassic.rutgers.edu> writes:

> I think it's agreed that being able to specify the format attribute's
> properties in the source code rather than hardcoding it into the
> compiler is better. :-) So I'd like to take a stab at doing this.  If
> we can come up with a good design, implementing it shouldn't be too
> hard.

I'm going to ignore the meta discussion on this topic.

I don't think it makes much sense to tie the description to the
current data structures.  I also think it should be a function
attribute rather than a pragma, though that is a minor issue.

printf format checking is a matter of comparing a format string with
an argument list.  Success is finding that the types of the arguments
match the argument types specified by the format string.  The trick is
to map the format string to a list of types.

Intuitively the most logical approach would seem to be a state machine
driven by the characters in the format string.  Periodically the state
machine would emit a type.  For example
   start (ANY) > start
   start ('%') > directive-start
   directive-start ('d') > TYPE(int) start
   directive-start ('l') > directive-long
   ...
   directive-long ('d') > TYPE(long) start
etc.

Since the goal is to produce an attribute string, we can see that it's
pretty easy to describe this kind of state machine using a little
language.  LABEL is [0-9]+.  CHAR is any character in the string.
TYPENAME is any string, meant to be the name of a type.

    string: element*
    element: label character nextstate
    label: /* empty */ | LABEL ':'
    character: /* empty */ | CHAR
    nextstate: '>' type LABEL
    type: /* empty */ | '{' TYPENAME '}'

A missing character always matches.

So for printf we have:

   0:%>1;>0;1:d>{int}0;l>2 ... 2:d>{long}0;

etc.

This isn't quite enough to handle %2$d.  For that we need to add a
feature to the little language meaning "read a number and use it as
the index of the argument to use for the following type."  We
represent this by adding an alternative to type:
    type: /* empty */ | '{' TYPENAME '}' | '{' '#' '}'
So now we have:

    1:0123456789>{#}2;d{int}>0;2:$>1

Here I introduce the notion of permitting several characters before
the '>', meaning that they all have the same effect on the state
machine.  printf flag characters would be handled this way, as they do
not affect type checking.
    character: /* empty */ | CHAR character

While it would be tedious to write down all the printf specifiers this
way, I suspect that it is doable.  Does anybody see any constructs in
printf or other typical uses which could not be represented?

In any case, what would make this useful is to be able to say "this
attribute string is printf plus the following".  Then the string would
add to and override the state machine created from the default printf
string.

The language is obviously very limited, and intentionally so.  It is
easy to detect a loop: you return to the same state without consuming
any input.  Obviously there are constructs which can not be parsed
using a state machine, but I don't think that these are likely to be
used by printf-like functions.  The string notation is terse (perhaps
overly so) but unambiguous.  We would use backslash quoting for
special characters like ':', '>', ';', '{', '}', '\\'.

Thoughts?

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]