Support for %d$c format specifier in diagnostics.c

Ishikawa ishikawa@yk.rim.or.jp
Sat Jul 5 00:15:00 GMT 2003


Zack Weinberg wrote:
> 
> Ishikawa <ishikawa@yk.rim.or.jp> writes:
> 
> >> LC_MESSAGES=C should disable translation without any other effect.
> >> (Not all systems support this variable, however.)
> >
> > Thank you for the info. Hmm, I am not sure which system supports
> > LC_MESSAGES and which doesn't.
> If you are interested in I18N projects, another worthwhile improvement
> would be to add support for %n$s notation to diagnostic.c.  See
> <http://www.opengroup.org/onlinepubs/007904975/functions/printf.html> for
> details of how this is supposed to work.
> 
> I'll be happy to answer questions as you have them.

As per Zack's suggestion, I am working on the initial cut of
the support of %n$s notation in diagnostic.c
(This is not strictly I18N/L10N stuff, though.)

Anyway, here is an overview of what I am coding and
comment/feedback welcome.

===

Support for %d$c format specifier in diagnostics.c


Given Conditions/Limitation:

      Field width specification via passed argument is 
      supported for only string 's' specifier in diagnostic.c
      I won't change this. (and this simplifies the
      handling of %n$c format specifier.)


Usage Example:

output_printf (buffer, "%1$d %3$s %2$c", intvalue, charvalue, "string")

To support the above, we need to scan the whole format string to find
out the type of each i-th value passed to output_format.
(For example, to get to the 3rd argument for %3$s, we need to
know the type of 1st and 2nd argument types to adjust the
pointer to access the said values.)

So when we see ONE format specifier in the form of %1$d format (1
could be 2, 3, 4...),
we scan the whole format string and 
we build an internal array that  defines the type of i-th argument.
(Afterward we scan the argument list and record the
value of argument themselves.)
(UNLESS one such reference to a format specifier in the form of
%n$c is done, we won't have the overhead of the array construction.)

Requirement/Caution:

The size of paramaters pushed on the stack can vary (say, int vs long
long, and/or 64 bits pointer vs 32 bit int, etc.), so we must KNOW the
size of each argument before constructing the argument value
list.  (This is why we use a two pass algorithm below.  one for the
format string, and then the scan of the argument.)

Passing the correct types in the format string is
and has always been the responsibility of the caller.

[] %d$n format specifier usage rule: ???

Now, I am not entirely sure of the usage
rule of %n$c format specifier.

It seems to me ONCE we use the %n$ construct,
we probably need to use this format from this point on always.
(Or we need to use this form for ALL the format specifiers.)

OK	    %1$d %3$c %2$s %6$d %5$s %4$c

OK	    %d %s %c %6$d %5$s %4$c

ambiguous   %d %s %c %6$d %s %4$c
            1  2  3   4   5  6

In the last example, against which argument is the format specifier
above (the fifth one) should be used?  (So we probably should not
permit it.)  Does anyone know for sure?
The given URL was not very clear on this, but the example there
showed format string with the specifiers in ALL %n$c formats.

[] Current implementation overview.

I am writing a preliminary code right now.

My current Algorithm is as follows.

	First we scan the format specifier list and
	records the type of i-th argument.

	Before setting, we should check the double definition
	(which is OK?) and incompatible double definition, etc..

	Secondly, then using the array [] of the type of i-th argument
	we scan the real argument list and
	records the value.
	If we find undefined type for used arg, then
	we must abort();

Limitation:
	My first implementation will
	handle only 1-9 range: 
	%1$... %9$...
	This should suffice for GCC usage.


Comment/feedback welcome.

-- 
int main(void){int j=2003;/*(c)2003 cishikawa. */
char t[] ="<CI> @abcdefghijklmnopqrstuvwxyz.,\n\"";
char *i ="g>qtCIuqivb,gCwe\np@.ietCIuqi\"tqkvv is>dnamz";
while(*i)((j+=strchr(t,*i++)-(int)t),(j%=sizeof t-1),
(putchar(t[j])));return 0;}/* under GPL */



More information about the Gcc mailing list