Checking format specifiers

Ishikawa ishikawa@yk.rim.or.jp
Thu Jul 31 17:22:00 GMT 2003


"Joseph S. Myers" wrote:
> 
> On Tue, 22 Jul 2003, Wolfgang Bangerth wrote:
> 
> > Also, I can only again express my confusion how it is possible that the
> > translation project provides us with translations that have 400 errors in
> > roughly 5000 message strings. It is not a matter of just sending them a
> > polite request to change a file or two -- they need to change their
> > processes. I'm not in a position to do that, I guess.
> 
> Translation Project: could you use the script in
> <http://gcc.gnu.org/ml/gcc-patches/2003-07/msg02191.html> as a required
> check on GCC translations before they are accepted by the TP site, and
> send details of the problems it shows up in the current translations to
> the translators to fix?

I have hacked xgettext and msgfmt from GNU gettext 
so that they understand the extended format character
set used within GCC's internal diagnostic  routines.

Someone might want to take a look 
and use the patched version for future testing/checking, etc..

I have sent an e-mail to gnu-gettext-bug already.


This has been  sent to bug-gnu-gettext@gnu.org
and am sending to the translation project mailing list as well.

--- begin quote ---

(I modified xgettext.c as well, now.)


ADDITION:

After reading the description of gettext HTML documentation,
I realized xgettext() also needs to understand the
extended format characters that are used by GCC's internal
diagnostic routines. Otherwise, the c-format flag/option
is not produced for such format messages.

I used xgettext() on a small C file to verify my understanding.
For the following file, unmodified xgettext() failed to
procuce the c-format option line for the third error message.
--------
#include <libintl.h>
#define _(String) gettext(String) 
test()
{
  int i;
  error(_("This is a error message.\n"));
  error(_("This is a error message with standard C format specifier. %d = \n"), i);
  error(_("Non standard (GNU CC internal extension.) %T \n"), &i);
}
---------

So I modified xgettext() as well.
With --gcc option to xgettext(), it now produces the c-format
option for the third message.
The following patch at the end now includes the
xgettext.c patch as well.

So in a nutshell, for GCC suite message translation work,
please run xgettext with the additional --gcc flag to extract messages.
run msgfmt with the additional --gcc flag to check the 
translated messages.

New modified PATCH follows at the end.

-------

Request for modification of msgfmt.c and format-c.c

Hi, 

I am trying to get the positional format specifier support
in GCC's internal diagnostic routines.
These routines are invoked to emit warning/error messages
when GCC compilers encouter problems in source code they compile.

In the proces, I found out the problem of
msgfmt and use of extended format character sets by
the internal diagnostic routines of GCC.

msgfmt only understands the standard C printf format characters
whereas internal diagnostic routines of GCC use a few additional
chracters for formating.  
There are many warning/error messages that use such extended
format characters in GCC source files.
Since msgfmt doesn't understand these
characters, checking of *.po files will fail.  I suspect that as such
extended format characters were introduced, messages in GCC *.po lines
that contain such extended format characters were no longer marked as
c-format and ignored by msgfmt checking.

This is wrong and undesirable.
(There are many untranslated/fuzzy messages in various *.po files
and I suspect one reason is this lack of checking of 
the used format characters by msgfmt.)

I created a patch incorporate the support for extended format
character set used by the internal GCC diagnostic routines so that
msgfmt would check the c-format-like messages assuming
the extented format character set.

This added feature is invoked as follows

msgfmt --gcc --check

or

msgfmt -G --check

now checks for the format string consistency and understands such
extended character set.

It may not be perfect, but I believe this patch
is in the right direction.

One other thing.

msgfmt didn't print out warning/error if msgid
contains unrecognized format character, and skips
the checking altogether. This is undesirable IMHO.
So I added the error output for this error case.

---
The following is a check against modified ja.po of GCC.
I modified a few lines in ja.po so that
the extended format character set is used in
messages marked as c-format string.

Without "--gcc" flag, the modified msgfmt signals error.

LC_ALL=C
ishikawa@duron$ export LC_ALL
ishikawa@duron$ /usr/local/bin/msgfmt --check ja.po
ja.po:14525: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14539: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'D' is not a valid conversion specifier.
ja.po:14548: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14557: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'T' is not a valid conversion specifier.
ja.po:14761: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character 'D' is not a valid conversion specifier.
ja.po:14766: 'msgid' is not a valid C format string. Reason: In the directive number 1, the character ' ' is not a valid conversion specifier.
/usr/local/bin/msgfmt: found 6 fatal errors

The last error message above is a very subtle case where "%L " is a
valid internal GCC diagnostic format extension. However, XPG printf
spec seems to think "%L" is a prefix to "d" and other numerical
specifier and so when a ' ' is found, msgfmt prints error 
mentioning ' '.

With "--gcc" flag, the modified msgfmt checks it
assuming the extended format character set, and
it produced no errors. Good.


ishikawa@duron$ /usr/local/bin/msgfmt --check --gcc ja.po


I checked various error combinations and
modified msgfmt seems to detect them
very well.



Index: format-c.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/format-c.c,v
retrieving revision 1.2
diff -c -3 -p -r1.2 format-c.c
*** format-c.c	24 Feb 2003 10:54:10 -0000	1.2
--- format-c.c	31 Jul 2003 17:05:28 -0000
***************
*** 20,25 ****
--- 20,27 ----
  # include <config.h>
  #endif
  
+ #include <stdio.h>
+ 
  #include <stdbool.h>
  #include <stdlib.h>
  
*************** enum format_arg_type
*** 132,138 ****
  			   | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
  			   | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
  			   | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! 			   | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T)
  };
  
  struct numbered_arg
--- 134,176 ----
  			   | FAT_SIZE_FAST8_T | FAT_SIZE_FAST16_T
  			   | FAT_SIZE_FAST32_T | FAT_SIZE_FAST64_T
  			   | FAT_SIZE_INTMAX_T | FAT_SIZE_INTPTR_T
! 			   | FAT_SIZE_SIZE_T | FAT_SIZE_PTRDIFF_T),
! 
! 
!   /* extension for GCC internal diagnostic format routine. */
!   /* We dont care what they are. Just be sure to treat these
!      as characters to stand for opaque types.
!    cp_printer() in gcc/gcc/cp/error.c
!      recognize these as individual/different specifiers. 
! 	case 'A': result = args_to_string (x_next_tree, verbose);	break;
! 	case 'C': result = code_to_string (x_next_tcode);	        break;
! 	case 'D': result = decl_to_string (x_next_tree, verbose);	break;
! 	case 'E': result = expr_to_string (x_next_tree);      	break;
! 	case 'F': result = fndecl_to_string (x_next_tree, verbose);	break;
! 	case 'L': result = language_to_string (x_next_lang);          break;
! 	case 'O': result = op_to_string (x_next_tcode);       	break;
! 	case 'P': result = parm_to_string (x_next_int);	        break;
! 	case 'Q': result = assop_to_string (x_next_tcode);	        break;
! 	case 'T': result = type_to_string (next_tree, verbose);	break;
! 	case 'V': result = cv_to_string (next_tree, verbose);	break;
! 
!     c-objc-common.c uses 'D', 'F', 'T', and 'E'. They are covered by above.
!     toplev.c uses 'D', 'F', and 'T'.
! 
!    Because of the  size mask, the following may not work well. 
!    'L' needs special processing.  
! */
! 
!   FAT_GCC_EXT_A = FAT_SIZE_PTRDIFF_T + 1,
!   FAT_GCC_EXT_C,
!   FAT_GCC_EXT_D,
!   FAT_GCC_EXT_F,
!   FAT_GCC_EXT_L,
!   FAT_GCC_EXT_O,
!   FAT_GCC_EXT_P,
!   FAT_GCC_EXT_Q,
!   FAT_GCC_EXT_T,
!   FAT_GCC_EXT_V
  };
  
  struct numbered_arg
*************** struct spec
*** 163,168 ****
--- 201,209 ----
  #define isdigit(c) ((unsigned int) ((c) - '0') < 10)
  
  
+ int gcc_warning_extended_format_check;
+ 
+ 
  static int
  numbered_arg_compare (const void *p1, const void *p2)
  {
*************** format_parse (const char *format, char *
*** 630,635 ****
--- 671,715 ----
  		type |= (size & FAT_SIZE_MASK);
  		break;
  	      default:
+ 		if(gcc_warning_extended_format_check)
+ 		  {
+ 		    if(format[-1] == 'L')
+ 		      {
+ 			/* L is understood by XPG C format spec, too. 
+ 			   So if we ever want to use this msgfmt.c to
+ 			   check for format extension
+ 			   used within GCC's diagnostic extension,
+ 			   'L' should be followed by characters
+ 			   not understood by above case statements.
+ 			   For example, ' ' (space) would do. 
+ 			   I checked the ja.po for GCC and found that %L 
+ 			   is indeed followed by ' '.
+ 			*/
+ 			type = FAT_GCC_EXT_L; break;
+ 		      }
+ #define SET_GCC_EXTENDED(label,c) case label : type = FAT_GCC_EXT_##c; break;
+ 		    switch(*format)
+ 		      {
+ 			SET_GCC_EXTENDED('A',A);
+ 			SET_GCC_EXTENDED('C',C);
+ 			SET_GCC_EXTENDED('D',D);
+ 			SET_GCC_EXTENDED('F',F);
+ 			SET_GCC_EXTENDED('L',L);
+ 			SET_GCC_EXTENDED('O',O);
+ 			SET_GCC_EXTENDED('P',P);
+ 			SET_GCC_EXTENDED('Q',Q);
+ 			SET_GCC_EXTENDED('T',T);
+ 			SET_GCC_EXTENDED('V',V);
+ 
+ 		      default:
+ 			goto bad; 	/* unknown  */
+ 		      }
+ #undef SET_GCC_EXTENDED
+ 		    break;		/* get out of the outer switch. */
+ 
+ 		  }
+ 	      bad:;
+ 
  		*invalid_reason =
  		  (*format == '\0'
  		   ? INVALID_UNTERMINATED_DIRECTIVE ()
Index: msgfmt.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/msgfmt.c,v
retrieving revision 1.9
diff -c -3 -p -r1.9 msgfmt.c
*** msgfmt.c	29 Apr 2003 10:12:15 -0000	1.9
--- msgfmt.c	31 Jul 2003 17:05:30 -0000
*************** static int exit_status;
*** 78,83 ****
--- 78,86 ----
  /* If true include even fuzzy translations in output file.  */
  static bool include_all = false;
  
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+ 
  /* Specifies name of the output file.  */
  static const char *output_file_name;
  
*************** static const struct option long_options[
*** 155,160 ****
--- 158,166 ----
    { "check-format", no_argument, NULL, CHAR_MAX + 3 },
    { "check-header", no_argument, NULL, CHAR_MAX + 4 },
    { "directory", required_argument, NULL, 'D' },
+ 
+   { "gcc", no_argument, NULL, 'G' },
+ 
    { "help", no_argument, NULL, 'h' },
    { "java", no_argument, NULL, 'j' },
    { "java2", no_argument, NULL, CHAR_MAX + 5 },
*************** main (int argc, char *argv[])
*** 214,220 ****
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fhjl:o:Pr:vV", long_options,
  			     NULL))
  	 != EOF)
      switch (opt)
--- 220,226 ----
    bindtextdomain (PACKAGE, relocate (LOCALEDIR));
    textdomain (PACKAGE);
  
!   while ((opt = getopt_long (argc, argv, "a:cCd:D:fGhjl:o:Pr:vV", long_options,
  			     NULL))
  	 != EOF)
      switch (opt)
*************** main (int argc, char *argv[])
*** 248,253 ****
--- 254,264 ----
        case 'f':
  	include_all = true;
  	break;
+ 
+       case 'G':
+ 	gcc_warning_extended_format_check = true;
+ 	break;
+ 
        case 'h':
  	do_help = true;
  	break;
*************** Input file interpretation:\n"));
*** 586,591 ****
--- 597,604 ----
                                  menu items\n"));
        printf (_("\
    -f, --use-fuzzy             use fuzzy entries in output\n"));
+       printf (_("\
+   -G, --gcc                   check with GCC internal format extension.\n"));
        printf ("\n");
        printf (_("\
  Output details:\n"));
*************** check_pair (const char *msgid,
*** 1167,1173 ****
  					 msgid_plural == NULL,
  					 true, pretty_msgstr))
  			exit_status = EXIT_FAILURE;
! 
  		      parser->free (msgstr_descr);
  		    }
  		  else
--- 1180,1186 ----
  					 msgid_plural == NULL,
  					 true, pretty_msgstr))
  			exit_status = EXIT_FAILURE;
!  
  		      parser->free (msgstr_descr);
  		    }
  		  else
*************** check_pair (const char *msgid,
*** 1188,1195 ****
  	      parser->free (msgid_descr);
  	    }
  	  else
! 	    free (invalid_reason);
! 	}
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
--- 1201,1221 ----
  	      parser->free (msgid_descr);
  	    }
  	  else
! 	    {
! 	      /* we have forgotten to output error here?! */
! 	      error_with_progname = false;
! 	      error_at_line (0, 0, msgid_pos->file_name,
! 			     msgid_pos->line_number,
! 			     _("\
! '%s' is not a valid %s format string. Reason: %s"),
! 			     "msgid", format_language_pretty[i],
! 			     invalid_reason);
! 	      error_with_progname = true;
! 	      exit_status = EXIT_FAILURE;
!  
! 	      free (invalid_reason);
! 	    }
!     }
  
    if (check_accelerators && msgid_plural == NULL)
      /* Test 4: Check that if msgid is a menu item with a keyboard accelerator,
Index: xgettext.c
===================================================================
RCS file: /cvs/gettext/gettext/gettext-tools/src/xgettext.c,v
retrieving revision 1.16
diff -c -3 -p -r1.16 xgettext.c
*** xgettext.c	27 Jun 2003 12:35:05 -0000	1.16
--- xgettext.c	31 Jul 2003 17:05:32 -0000
***************
*** 80,85 ****
--- 80,88 ----
  #include "x-glade.h"
  
  
+ /* If true, do gcc diagnostic routine() extended format character check */
+ extern int gcc_warning_extended_format_check; /* in format-c.c */
+ 
  /* If nonzero add all comments immediately preceding one of the keywords. */
  static bool add_all_comments = false;
  
*************** static const struct option long_options[
*** 159,164 ****
--- 162,168 ----
    { "force-po", no_argument, &force_po, 1 },
    { "foreign-user", no_argument, NULL, CHAR_MAX + 2 },
    { "from-code", required_argument, NULL, CHAR_MAX + 3 },
+   { "gcc", no_argument, NULL, 'G'},
    { "help", no_argument, NULL, 'h' },
    { "indent", no_argument, NULL, 'i' },
    { "join-existing", no_argument, NULL, 'j' },
*************** main (int argc, char *argv[])
*** 244,250 ****
    xgettext_global_source_encoding = po_charset_ascii;
  
    while ((optchar = getopt_long (argc, argv,
! 				 "ac::Cd:D:eEf:Fhijk::l:L:m::M::no:p:sTVw:x:",
  				 long_options, NULL)) != EOF)
      switch (optchar)
        {
--- 248,254 ----
    xgettext_global_source_encoding = po_charset_ascii;
  
    while ((optchar = getopt_long (argc, argv,
! 				 "ac::Cd:D:eEf:FGhijk::l:L:m::M::no:p:sTVw:x:",
  				 long_options, NULL)) != EOF)
      switch (optchar)
        {
*************** main (int argc, char *argv[])
*** 296,301 ****
--- 300,308 ----
        case 'f':
  	files_from = optarg;
  	break;
+       case 'G':
+ 	gcc_warning_extended_format_check = true;
+ 	break;
        case 'F':
  	sort_by_filepos = true;
          break;
*************** Choice of input file language:\n"));
*** 663,668 ****
--- 670,678 ----
    -C, --c++                   shorthand for --language=C++\n"));
        printf (_("\
  By default the language is guessed depending on the input file name extension.\n"));
+       printf (_("\
+   -G, --gcc                   support internally used GCC diagnostic format\n"));
+ 
        printf ("\n");
        printf (_("\
  Input file interpretation:\n"));

--- end quote 
---

-- 
int main(void){int j=2003;/*(c)2003 cishikawa. */
char t[] ="<CI> @abcdefghijklmnopqrstuvwxyz.,\n\"";
char *i ="g>qtCIuqivb,gCwe\np@.ietCIuqi\"tqkvv is>dnamz";
while(*i)((j+=strchr(t,*i++)-(int)t),(j%=sizeof t-1),
(putchar(t[j])));return 0;}/* under GPL */



More information about the Gcc-patches mailing list