This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Checking format specifiers
"Joseph S. Myers" wrote:
>
> On Tue, 22 Jul 2003, Wolfgang Bangerth wrote:
>
> > Also, I can only again express my confusion how it is possible that the
> > translation project provides us with translations that have 400 errors in
> > roughly 5000 message strings. It is not a matter of just sending them a
> > polite request to change a file or two -- they need to change their
> > processes. I'm not in a position to do that, I guess.
>
> Translation Project: could you use the script in
> <http://gcc.gnu.org/ml/gcc-patches/2003-07/msg02191.html> as a required
> check on GCC translations before they are accepted by the TP site, and
> send details of the problems it shows up in the current translations to
> the translators to fix?
I have hacked xgettext and msgfmt from GNU gettext
so that they understand the extended format character
set used within GCC's internal diagnostic routines.
Someone might want to take a look
and use the patched version for future testing/checking, etc..
I have sent an e-mail to gnu-gettext-bug already.
This has been sent to bug-gnu-gettext@gnu.org
and am sending to the translation project mailing list as well.
--- begin quote ---
(I modified xgettext.c as well, now.)
ADDITION:
After reading the description of gettext HTML documentation,
I realized xgettext() also needs to understand the
extended format characters that are used by GCC's internal
diagnostic routines. Otherwise, the c-format flag/option
is not produced for such format messages.
I used xgettext() on a small C file to verify my understanding.
For the following file, unmodified xgettext() failed to
procuce the c-format option line for the third error message.
--------
#include <libintl.h>
#define _(String) gettext(String)
test()
{
int i;
error(_("This is a error message.\n"));
error(_("This is a error message with standard C format specifier. %d = \n"), i);
error(_("Non standard (GNU CC internal extension.) %T \n"), &i);
}
---------
So I modified xgettext() as well.
With --gcc option to xgettext(), it now produces the c-format
option for the third message.
The following patch at the end now includes the
xgettext.c patch as well.
So in a nutshell, for GCC suite message translation work,
please run xgettext with the additional --gcc flag to extract messages.
run msgfmt with the additional --gcc flag to check the
translated messages.
New modified PATCH follows at the end.
-------
Request for modification of msgfmt.c and format-c.c
Hi,
I am trying to get the positional format specifier support
in GCC's internal diagnostic routines.
These routines are invoked to emit warning/error messages
when GCC compilers encouter problems in source code they compile.
In the proces, I found out the problem of
msgfmt and use of extended format character sets by
the internal diagnostic routines of GCC.
msgfmt only understands the standard C printf format characters
whereas internal diagnostic routines of GCC use a few additional
chracters for formating.
There are many warning/error messages that use such extended
format characters in GCC source files.
Since msgfmt doesn't understand these
characters, checking of *.po files will fail. I suspect that as such
extended format characters were introduced, messages in GCC *.po lines
that contain such extended format characters were no longer marked as
c-format and ignored by msgfmt checking.
This is wrong and undesirable.
(There are many untranslated/fuzzy messages in various *.po files
and I suspect one reason is this lack of checking of
the used format characters by msgfmt.)
I created a patch incorporate the support for extended format
character set used by the internal GCC diagnostic routines so that
msgfmt would check the c-format-like messages assuming
the extented format character set.
This added feature is invoked as follows
msgfmt --gcc --check
or
msgfmt -G --check
now checks for the format string consistency and understands such
extended character set.
It may not be perfect, but I believe this patch
is in the right direction.
One other thing.
msgfmt didn't print out warning/error if msgid
contains unrecognized format character, and skips
the checking altogether. This is undesirable IMHO.
So I added the error output for this error case.
---
The following is a check against modified ja.po of GCC.
I modified a few lines in ja.po so that
the extended format character set is used in
messages marked as c-format string.
Without "--gcc" flag, the modified msgfmt signals error.
LC_ALL=C
ishikawa@duron$ export LC_ALL
ishikawa@duron$ /usr/local/bin/msgfmt --check ja.po
ja.po:14525: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14539: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'D' is not a valid conversion specifier.
ja.po:14548: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14557: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14761: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'D' is not a valid conversion specifier.
ja.po:14766: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character ' ' is not a valid conversion specifier.
/usr/local/bin/msgfmt: found 6 fatal errors
The last error message above is a very subtle case where "%L " is a
valid internal GCC diagnostic format extension. However, XPG printf
spec seems to think "%L" is a prefix to "d" and other numerical
specifier and so when a ' ' is found, msgfmt prints error
mentioning ' '.
With "--gcc" flag, the modified msgfmt checks it
assuming the extended format character set, and
it produced no errors. Good.
ishikawa@duron$ /usr/local/bin/msgfmt --check --gcc ja.po
I checked various error combinations and
modified msgfmt seems to detect them
very well.
Index: format-c.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/format-c.c,v
retrieving revision 1.2
diff -c -3 -p -r1.2 format-c.c
*** format-c.c 24 Feb 2003 10:54:10 -0000 1.2
--- format-c.c 31 Jul 2003 17:05:28 -0000
***************
*** 20,25 ****
--- 20,27 ----
# include <config.h>
#endif
+ #include <stdio.h>
+
#include <stdbool.h>
#include <stdlib.h>
*************** enum format_arg_type
*** 132,138 ****
| FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
| FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
| FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T)
};
struct numbered_arg
--- 134,176 ----
| FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
| FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
| FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T),
!
!
! /* extension for GCC internal diagnostic format routine. */
! /* We dont care what they are. Just be sure to treat these
! as characters to stand for opaque types.
! cp_printer() in gcc/gcc/cp/error.c
! recognize these as individual/different specifiers.
! case 'A': result = args_to_string (x_next_tree, verbose); break;
! case 'C': result = code_to_string (x_next_tcode); break;
! case 'D': result = decl_to_string (x_next_tree, verbose); break;
! case 'E': result = expr_to_string (x_next_tree); break;
! case 'F': result = fndecl_to_string (x_next_tree, verbose); break;
! case 'L': result = language_to_string (x_next_lang); break;
! case 'O': result = op_to_string (x_next_tcode); break;
! case 'P': result = parm_to_string (x_next_int); break;
! case 'Q': result = assop_to_string (x_next_tcode); break;
! case 'T': result = type_to_string (next_tree, verbose); break;
! case 'V': result = cv_to_string (next_tree, verbose); break;
!
! c-objc-common.c uses 'D', 'F', 'T', and 'E'. They are covered by above.
! toplev.c uses 'D', 'F', and 'T'.
!
! Because of the size mask, the following may not work well.
! 'L' needs special processing.
! */
!
! FAT_GCC_EXT_A = FAT_SIZE_PTRDIFF_T + 1,
! FAT_GCC_EXT_C,
! FAT_GCC_EXT_D,
! FAT_GCC_EXT_F,
! FAT_GCC_EXT_L,
! FAT_GCC_EXT_O,
! FAT_GCC_EXT_P,
! FAT_GCC_EXT_Q,
! FAT_GCC_EXT_T,
! FAT_GCC_EXT_V
};
struct numbered_arg
*************** struct spec
*** 163,168 ****
--- 201,209 ----
#define isdigit(c) ((unsigned int) ((c) - '0') < 10)
+ int gcc_warning_extended_format_check;
+
+
static int
numbered_arg_compare (const void *p1, const void *p2)
{
*************** format_parse (const char *format, char *
*** 630,635 ****
--- 671,715 ----
type |= (size & FAT_SIZE_MASK);
break;
default:
+ if(gcc_warning_extended_format_check)
+ {
+ if(format[-1] == 'L')
+ {
+ /* L is understood by XPG C format spec, too.
+ So if we ever want to use this msgfmt.c to
+ check for format extension
+ used within GCC's diagnostic extension,
+ 'L' should be followed by characters
+ not understood by above case statements.
+ For example, ' ' (space) would do.
+ I checked the ja.po for GCC and found that %L
+ is indeed followed by ' '.
+ */
+ type = FAT_GCC_EXT_L; break;
+ }
+ #define SET_GCC_EXTENDED(label,c) case label : type = FAT_GCC_EXT_##c; break;
+ switch(*format)
+ {
+ SET_GCC_EXTENDED('A',A);
+ SET_GCC_EXTENDED('C',C);
+ SET_GCC_EXTENDED('D',D);
+ SET_GCC_EXTENDED('F',F);
+ SET_GCC_EXTENDED('L',L);
+ SET_GCC_EXTENDED('O',O);
+ SET_GCC_EXTENDED('P',P);
+ SET_GCC_EXTENDED('Q',Q);
+ SET_GCC_EXTENDED('T',T);
+ SET_GCC_EXTENDED('V',V);
+
+ default:
+ goto bad; /* unknown */
+ }
+ #undef SET_GCC_EXTENDED
+ break; /* get out of the outer switch. */
+
+ }
+ bad:;
+
*invalid_reason =
(*format == '\0'
? INVALID_UNTERMINATED_DIRECTIVE ()
Index: msgfmt.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/msgfmt.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 msgfmt.c
*** msgfmt.c 29 Apr 2003 10:12:15 -0000 1.9
--- msgfmt.c 31 Jul 2003 17:05:30 -0000
*************** static int exit_status;
*** 78,83 ****
--- 78,86 ----
/* If true include even fuzzy translations in output file. */
static bool include_all = false;
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+
/* Specifies name of the output file. */
static const char *output_file_name;
*************** static const struct option long_options[
*** 155,160 ****
--- 158,166 ----
{ "check-format", no_argument, NULL, CHAR_MAX + 3 },
{ "check-header", no_argument, NULL, CHAR_MAX + 4 },
{ "directory", required_argument, NULL, 'D' },
+
+ { "gcc", no_argument, NULL, 'G' },
+
{ "help", no_argument, NULL, 'h' },
{ "java", no_argument, NULL, 'j' },
{ "java2", no_argument, NULL, CHAR_MAX + 5 },
*************** main (int argc, char *argv[])
*** 214,220 ****
bindtextdomain (PACKAGE, relocate (LOCALEDIR));
textdomain (PACKAGE);
! while ((opt = getopt_long (argc, argv, "a:cCd:D:fhjl:o:Pr:vV", long_options,
NULL))
!= EOF)
switch (opt)
--- 220,226 ----
bindtextdomain (PACKAGE, relocate (LOCALEDIR));
textdomain (PACKAGE);
! while ((opt = getopt_long (argc, argv, "a:cCd:D:fGhjl:o:Pr:vV", long_options,
NULL))
!= EOF)
switch (opt)
*************** main (int argc, char *argv[])
*** 248,253 ****
--- 254,264 ----
case 'f':
include_all = true;
break;
+
+ case 'G':
+ gcc_warning_extended_format_check = true;
+ break;
+
case 'h':
do_help = true;
break;
*************** Input file interpretation:\n"));
*** 586,591 ****
--- 597,604 ----
menu items\n"));
printf (_("\
-f, --use-fuzzy use fuzzy entries in output\n"));
+ printf (_("\
+ -G, --gcc check with GCC internal format extension.\n"));
printf ("\n");
printf (_("\
Output details:\n"));
*************** check_pair (const char *msgid,
*** 1167,1173 ****
msgid_plural == NULL,
true, pretty_msgstr))
exit_status = EXIT_FAILURE;
!
parser->free (msgstr_descr);
}
else
--- 1180,1186 ----
msgid_plural == NULL,
true, pretty_msgstr))
exit_status = EXIT_FAILURE;
!
parser->free (msgstr_descr);
}
else
*************** check_pair (const char *msgid,
*** 1188,1195 ****
parser->free (msgid_descr);
}
else
! free (invalid_reason);
! }
if (check_accelerators && msgid_plural == NULL)
/* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
--- 1201,1221 ----
parser->free (msgid_descr);
}
else
! {
! /* we have forgotten to output error here?! */
! error_with_progname = false;
! error_at_line (0, 0, msgid_pos->file_name,
! msgid_pos->line_number,
! _("\
! '%s' is not a valid %s format string. Reason: %s"),
! "msgid", format_language_pretty[i],
! invalid_reason);
! error_with_progname = true;
! exit_status = EXIT_FAILURE;
!
! free (invalid_reason);
! }
! }
if (check_accelerators && msgid_plural == NULL)
/* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
Index: xgettext.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/xgettext.c,v
retrieving revision 1.16
diff -c -3 -p -r1.16 xgettext.c
*** xgettext.c 27 Jun 2003 12:35:05 -0000 1.16
--- xgettext.c 31 Jul 2003 17:05:32 -0000
***************
*** 80,85 ****
--- 80,88 ----
#include "x-glade.h"
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+
/* If nonzero add all comments immediately preceding one of the keywords. */
static bool add_all_comments = false;
*************** static const struct option long_options[
*** 159,164 ****
--- 162,168 ----
{ "force-po", no_argument, &force_po, 1 },
{ "foreign-user", no_argument, NULL, CHAR_MAX + 2 },
{ "from-code", required_argument, NULL, CHAR_MAX + 3 },
+ { "gcc", no_argument, NULL, 'G'},
{ "help", no_argument, NULL, 'h' },
{ "indent", no_argument, NULL, 'i' },
{ "join-existing", no_argument, NULL, 'j' },
*************** main (int argc, char *argv[])
*** 244,250 ****
xgettext_global_source_encoding = po_charset_ascii;
while ((optchar = getopt_long (argc, argv,
! "ac::Cd:D:eEf:Fhijk::l:L:m::M::no:p:sTVw:x:",
long_options, NULL)) != EOF)
switch (optchar)
{
--- 248,254 ----
xgettext_global_source_encoding = po_charset_ascii;
while ((optchar = getopt_long (argc, argv,
! "ac::Cd:D:eEf:FGhijk::l:L:m::M::no:p:sTVw:x:",
long_options, NULL)) != EOF)
switch (optchar)
{
*************** main (int argc, char *argv[])
*** 296,301 ****
--- 300,308 ----
case 'f':
files_from = optarg;
break;
+ case 'G':
+ gcc_warning_extended_format_check = true;
+ break;
case 'F':
sort_by_filepos = true;
break;
*************** Choice of input file language:\n"));
*** 663,668 ****
--- 670,678 ----
-C, --c++ shorthand for --language=C++\n"));
printf (_("\
By default the language is guessed depending on the input file name extension.\n"));
+ printf (_("\
+ -G, --gcc support internally used GCC diagnostic format\n"));
+
printf ("\n");
printf (_("\
Input file interpretation:\n"));
--- end quote
---
--
int main(void){int j=2003;/*(c)2003 cishikawa. */
char t[] ="<CI> @abcdefghijklmnopqrstuvwxyz.,\n\"";
char *i ="g>qtCIuqivb,gCwe\np@.ietCIuqi\"tqkvv is>dnamz";
while(*i)((j+=strchr(t,*i++)-(int)t),(j%=sizeof t-1),
(putchar(t[j])));return 0;}/* under GPL */