This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR/42686] Align the help text output


On 04/07/2010 12:55 AM, Joseph S. Myers wrote:
Thanks. Fix the typo and all the comments. Change the confused variable cur_len_to_print to len_to_print.
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining;

I think it might be clearer to have the loop in the form


while (left_len > 0)

which at least would avoid questions of what mbrtowc is meant to do when given a length of 0. After the loop, if cur_len_to_print is 0 or the entire string would fit in the given number of columns, set cur_len_to_print accordingly to indicate that the whole string should be printed. Then make the loop mimic that in the non-i18n version of the function as much as possible: if the previous columns occupy at least the full width and a point to break has been found, then break; otherwise decode a character; if a space then breaking *before* the space is OK; if '-' or '/' (and the other conditions apply) then breaking *after* that character is OK.

Incidentally, it would be a lot clearer if you had a single variable for the byte index into the string (like the non-i18n version) instead of both increasing msgstr and decreasing left_len - so the loop would be "while (i < remaining)". You wouldn't need to do "remaining - left_len" computations that way.
Both pre_len_to_print and len_to_print are initialized REMAINING now. If no suitable line-breaking position has been found, the whole line will be printed.
a variable i is added to index the string to decode and use


while (i < remaining)

to control the loop. It looks clear. Thanks.

+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
+      else  if ((wc[0] == L'-' || wc[0] == L'/')
+	        && iswalpha (pre_wc))
+	pre_len_to_print = cur_len_to_print = remaining - left_len;

Will pre_wc always be initialized here, if the string starts with '-' or '/'?
It is changed to
+ else if ((wc[0] == L'-' || wc[0] == L'/')
+ && i > nbytes
+ && iswalpha (pre_wc))
(i > nbytes) would ensure the pre_wc is initialized. I found this solution from the non-i18n version.

+ else if (nbytes > 1)

I still see no reason you should need conditionals on the number of bytes in a character, instead of always working with characters regardless of the number of bytes in them.
This condition is to make the string can be break after a multi bytes wide alphabetic or punctuations. The line can be break after every multi bytes wide alphabetic or punctuations. I think the nbytes can distinguish if it is a wide character before decode. The letter 'a' is recognized wide character after the decoding, but the line can't be broken after a 'a' at the string "after". I think the difference between one byte wide character and the multi bytes wide character is the nbytes.

Is it better changed to

+ else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)

better?

+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining - left_len;
+	  pre_wc = wc[0];
+	  pre_nbytes = nbytes;
+	  /* Keep the puncts printed with the previous word. */

I don't understand what issue you are dealing with here, but punctuation other than '/' or '-' isn't considered a valid line-breaking point anyway so you shouldn't need special code for it.


Since the line can be broken after a multi bytes alphabetic, the following punctuation maybe be printed as the first character at the new line if the punctuation didn't be dealing with. I also found at some special cases, only a "full stop" printed at a new line. It looks strange. So I design to print these punctuation with the previous alphabetic.
That is to say, if a wide character followed by a wide character punctuation, the possible line-breaking position would not be set after this wide character. It would be set after the punctuation.
yeah, the '/' and '-' must be considered. How about used (nbytes > 1) again to distinguish the multi bytes wide character punctuation? It at least ensure the Chinese multi bytes punctuations are handled. The other punctuation will be handle just like the non-i18n version.


Thanks
Pearly
Index: intl.h
===================================================================
--- intl.h	(revision 158214)
+++ intl.h	(working copy)
@@ -61,6 +61,8 @@ extern const char *fake_ngettext(const c
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 158214)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,103 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Look for a suitable point at which to break string MSGSTR. If there is one 
+   after at most ROOM columns, take the latest such point. Otherwise, 
+   take the first such point in the string, or the end of the string if it 
+   cannot be broken.
+   REMAINING is the number of bytes of MSGSTR.
+   Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1]; 
+  wchar_t pre_wc;
+
+  /* The byte index of the string when convert to a wide character. */
+  unsigned int i = 0;
+  
+  /* The number of column positions for the printed wide-character string. */
+  unsigned int colwidth = 0;
+
+  /* The number of bytes that may be printed at the possible line-breaking
+     position found so far, or REMAINING if no suitable break has been found. */
+  unsigned int len_to_print = remaining;
+
+  /* The previous number of bytes that can be printed at the possible
+     line-breaking position found so far, or REMAINING if no suitable
+     break has been found. */
+  unsigned int pre_len_to_print = remaining;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+
+  /* Convert one character at a time, keep track of the width at each point 
+     and of the byte count to the last possible point for a line break. The 
+     best candidate point found so far to break MSGSTR is after the space,
+     '-', '/', the multi bytes puncts or the multi bytes alphabetic.
+     If we are outside ROOM columns after LEN_TO_PRINT and have found
+     somewhere PRE_LEN_TO_PRINT to break the string, PRE_LEN_TO_PRINT is the
+     best point available; otherwise, the string will be break after
+     LEN_TO_PRINT. */ 
+  do
+    {
+      nbytes = mbrtowc (wc, msgstr, remaining - i, &state);
+      if (nbytes >= (size_t) -2)
+	/* Invalid input string. */
+	break;
+
+      i += nbytes;
+      colwidth += wcwidth (wc[0]);
+
+      /* Compute the point to break the string. */
+      if (i == remaining)
+	{
+	  pre_len_to_print = len_to_print;
+	  len_to_print = remaining;
+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = len_to_print = i - nbytes;
+      else if ((wc[0] == L'-' || wc[0] == L'/')
+	       && i > nbytes
+	       && iswalpha (pre_wc))
+	pre_len_to_print = len_to_print = i;
+      else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)
+	{
+	  size_t nbytes_next;
+	  pre_len_to_print = len_to_print;
+	  len_to_print = i;
+	  pre_wc = wc[0];
+	  nbytes_next = mbrtowc (wc, msgstr + nbytes, remaining - i, &state);
+	  if (nbytes_next >= (size_t) -2)
+	    break;
+	  /* Keep the multi bytes puncts printed with the previous word. */
+	  if (iswpunct (wc[0]) && iswalpha (pre_wc) && nbytes_next > 1)
+	    {
+	      i += nbytes_next;
+	      colwidth += wcwidth (wc[0]);
+	      len_to_print = i;
+	      msgstr += nbytes_next;
+	    }
+	}
+
+      if (colwidth >= room)
+	{
+	  /* If the string is outside ROOM columns after the possible 
+	     LEN_TO_PRINT, the break point will be set to the previous point. */
+	  if (colwidth > room)
+	    len_to_print = pre_len_to_print;
+	  break;
+	}
+
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  while (i < remaining);
+  return len_to_print;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,9 +217,7 @@ gcc_gettext_width (const char *msgstr)
 
 #endif
 
-#endif /* ENABLE_NLS */
-
-#ifndef ENABLE_NLS
+#else /* Not defined ENABLE_NLS */
 
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
@@ -132,6 +228,35 @@ fake_ngettext (const char *singular, con
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 158214)
+++ opts.c	(working copy)
@@ -1143,45 +1143,45 @@ decode_options (unsigned int argc, const
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column and HELP in the second column with the total
+   COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+   corresponding number of spaces for alignment. If the ROOM is less than
+   REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+   take the end of the string.
+   If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+   the left column from the secode line. */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
-  unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  unsigned int col_width = LEFT_COLUMN;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
-
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
+	len = get_aligned_len (help, room, remaining);
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1220,7 +1220,7 @@ print_filtered_help (unsigned int includ
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1236,6 +1236,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1272,7 +1273,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1313,7 +1317,7 @@ print_filtered_help (unsigned int includ
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]