This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]: designing customizable format attributes


"Joseph S. Myers" <joseph@codesourcery.com> writes:

> I'd use something a bit higher level than your state machine, to represent 
> better the structure format strings in fact have.  For example, length 
> modifiers might call a subroutine "find the first one of the strings in 
> this list which matches at this point in the string [the empty string 
> would be last] and record its index in this register" and then after the 
> conversion specifier has been parsed "look up an entry in this 
> two-dimensional array indexed by these two registers" would be called to 
> find the type if any for given length and conversion specifiers.

I considered that type of feature, and I'm certainly open to it.  I
decided not to include it in my initial sketch because it is not
necessary.  My goal for the little language was to keep it as simple
as possible, and to express it straightforwardly in a single string.
I don't anticipate many people will ever write these strings.  So for
me the focus was on simplicity and lack of ambiguity.

If somebody really got into it, you could write a little program to
take a more expansive and powerful description language, and parse
them into these little strings.  But I doubt there would be much call
for such a thing.

> Backtracking would also be simpler than with a pure state machine: a 
> decimal number after % could be either a width or an operand number; 
> "optionally parse a number followed by $, storing it in this register if 
> found" would be called, so the "parse width" routine would find itself 
> still at the start of the number if it wasn't followed by $ rather than 
> needing a state machine to do "parse number" then "if $ then operand 
> number else width".

The width is irrelevant to the type to be matched.  So it's actually
very easy to handle this ambiguity in the simple state machine I
described.  After you see a number, if you then see a $ you go back to
the state after seeing a '%'.  Otherwise you go to the state after
seeing a field width.

Looking at the code in c-format.c, I do see that the current code
issues a warning for a zero field width.  To implement that with the
simple state machine would be more tedious.  It would be necessary to
introduce a way to emit a warning message, and have a state which
represented "one or more 0 characters not followed by $".  I'm not
sure where to balance simplicity of design and ease of use here.

> > Since the goal is to produce an attribute string, we can see that it's
> > pretty easy to describe this kind of state machine using a little
> > language.  LABEL is [0-9]+.  CHAR is any character in the string.
> > TYPENAME is any string, meant to be the name of a type.
> 
> I think an interface taking some form of list of strings and types would 
> be better, to avoid calling back into the lexer and parser to decode type 
> names extracted from the string.

Good idea.  In that case we would have a string followed by a list of
types, and instead of {int} we would have {2} to mean to use the third
of the list of types (assuming zero based indexing).

> > In any case, what would make this useful is to be able to say "this
> > attribute string is printf plus the following".  Then the string would
> > add to and override the state machine created from the default printf
> > string.
> 
> Which in the state machine model is difficult to do because it depends on 
> fine details of how the printf state machine is implemented.

This turns out not to be the case.  You construct two state machines,
one for standard printf and one for the new string.  You run them in
parallel.  You use only the type matching specified by the new state
machine, except when it does not indicate a transition.  In that case
you revert to the printf state machine.  When the printf state machine
returns to state 0, you start running both state machines in parallel
again.

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]