This is the mail archive of the
mailing list for the GCC project.
Re: [RFC]: designing customizable format attributes
On Tue, 11 Jul 2005, Ian Lance Taylor wrote:
> Intuitively the most logical approach would seem to be a state machine
> driven by the characters in the format string. Periodically the state
> machine would emit a type. For example
This is a lot closer to what I think the datastructures should end up
looking like, which is why I think adding the feature with the present
datastructures would be premature and why I think too much influence of
the current datastructures on the appearance of the user interface to the
feature would be dangerous.
(One previous list discussion suggested regular expressions. Though more
or less isomorphic to state machines, I don't think they are a good match
to this particular problem.)
I'd use something a bit higher level than your state machine, to represent
better the structure format strings in fact have. For example, length
modifiers might call a subroutine "find the first one of the strings in
this list which matches at this point in the string [the empty string
would be last] and record its index in this register" and then after the
conversion specifier has been parsed "look up an entry in this
two-dimensional array indexed by these two registers" would be called to
find the type if any for given length and conversion specifiers. Various
subroutines would have ways to specify diagnostics for not matching.
Backtracking would also be simpler than with a pure state machine: a
decimal number after % could be either a width or an operand number;
"optionally parse a number followed by $, storing it in this register if
found" would be called, so the "parse width" routine would find itself
still at the start of the number if it wasn't followed by $ rather than
needing a state machine to do "parse number" then "if $ then operand
number else width".
I think it should be possible to move towards such structures
incrementally, gradually moving more logic into the datastructures and
reducing the number of special cases with their own flags or code.
> Since the goal is to produce an attribute string, we can see that it's
> pretty easy to describe this kind of state machine using a little
> language. LABEL is [0-9]+. CHAR is any character in the string.
> TYPENAME is any string, meant to be the name of a type.
I think an interface taking some form of list of strings and types would
be better, to avoid calling back into the lexer and parser to decode type
names extracted from the string.
> In any case, what would make this useful is to be able to say "this
> attribute string is printf plus the following". Then the string would
> add to and override the state machine created from the default printf
Which in the state machine model is difficult to do because it depends on
fine details of how the printf state machine is implemented.
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
firstname.lastname@example.org (personal mail)
email@example.com (CodeSourcery mail)
firstname.lastname@example.org (Bugzilla assignments and CCs)