This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR/42686] Align the help text output
- From: Shujing Zhao <pearly dot zhao at oracle dot com>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Paolo Carlini <paolo dot carlini at oracle dot com>
- Date: Tue, 13 Apr 2010 19:30:25 +0800
- Subject: Re: [PATCH PR/42686] Align the help text output
- References: <4B863559.4000407@oracle.com> <Pine.LNX.4.64.1002251547410.30254@digraph.polyomino.org.uk> <4B976D51.8010705@oracle.com> <Pine.LNX.4.64.1003101656110.12539@digraph.polyomino.org.uk> <4B9A1E65.4080506@oracle.com> <Pine.LNX.4.64.1003121651160.31790@digraph.polyomino.org.uk> <4B9DFA37.9010303@oracle.com> <Pine.LNX.4.64.1003151052520.31811@digraph.polyomino.org.uk> <4BA07D89.3040406@oracle.com> <Pine.LNX.4.64.1003171201450.2148@digraph.polyomino.org.uk> <4BA2010C.9030103@oracle.com> <Pine.LNX.4.64.1003181650570.19807@digraph.polyomino.org.uk> <4BA35936.8070304@oracle.com> <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk> <4BA72D29.2080406@oracle.com> <4BB2E45A.30909@oracle.com> <Pine.LNX.4.64.1004061553230.4485@digraph.polyomino.org.uk>
On 04/07/2010 12:55 AM, Joseph S. Myers wrote:
Thanks. Fix the typo and all the comments. Change the confused variable
cur_len_to_print to len_to_print.
+ /* Compute the point to break the string. */
+ if (left_len == 0)
+ {
+ pre_len_to_print = cur_len_to_print;
+ cur_len_to_print = remaining;
I think it might be clearer to have the loop in the form
while (left_len > 0)
which at least would avoid questions of what mbrtowc is meant to do when
given a length of 0. After the loop, if cur_len_to_print is 0 or the
entire string would fit in the given number of columns, set
cur_len_to_print accordingly to indicate that the whole string should be
printed. Then make the loop mimic that in the non-i18n version of the
function as much as possible: if the previous columns occupy at least the
full width and a point to break has been found, then break; otherwise
decode a character; if a space then breaking *before* the space is OK; if
'-' or '/' (and the other conditions apply) then breaking *after* that
character is OK.
Incidentally, it would be a lot clearer if you had a single variable for
the byte index into the string (like the non-i18n version) instead of both
increasing msgstr and decreasing left_len - so the loop would be "while (i
< remaining)". You wouldn't need to do "remaining - left_len"
computations that way.
Both pre_len_to_print and len_to_print are initialized REMAINING now. If no
suitable line-breaking position has been found, the whole line will be printed.
a variable i is added to index the string to decode and use
while (i < remaining)
to control the loop. It looks clear. Thanks.
+ }
+ else if (iswspace (wc[0]))
+ pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
+ else if ((wc[0] == L'-' || wc[0] == L'/')
+ && iswalpha (pre_wc))
+ pre_len_to_print = cur_len_to_print = remaining - left_len;
Will pre_wc always be initialized here, if the string starts with '-' or
'/'?
It is changed to
+ else if ((wc[0] == L'-' || wc[0] == L'/')
+ && i > nbytes
+ && iswalpha (pre_wc))
(i > nbytes) would ensure the pre_wc is initialized. I found this solution from
the non-i18n version.
+ else if (nbytes > 1)
I still see no reason you should need conditionals on the number of bytes
in a character, instead of always working with characters regardless of
the number of bytes in them.
This condition is to make the string can be break after a multi bytes wide
alphabetic or punctuations. The line can be break after every multi bytes wide
alphabetic or punctuations. I think the nbytes can distinguish if it is a wide
character before decode. The letter 'a' is recognized wide character after the
decoding, but the line can't be broken after a 'a' at the string "after". I
think the difference between one byte wide character and the multi bytes wide
character is the nbytes.
Is it better changed to
+ else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)
better?
+ {
+ pre_len_to_print = cur_len_to_print;
+ cur_len_to_print = remaining - left_len;
+ pre_wc = wc[0];
+ pre_nbytes = nbytes;
+ /* Keep the puncts printed with the previous word. */
I don't understand what issue you are dealing with here, but punctuation
other than '/' or '-' isn't considered a valid line-breaking point anyway
so you shouldn't need special code for it.
Since the line can be broken after a multi bytes alphabetic, the following
punctuation maybe be printed as the first character at the new line if the
punctuation didn't be dealing with. I also found at some special cases, only a
"full stop" printed at a new line. It looks strange. So I design to print these
punctuation with the previous alphabetic.
That is to say, if a wide character followed by a wide character punctuation,
the possible line-breaking position would not be set after this wide character.
It would be set after the punctuation.
yeah, the '/' and '-' must be considered. How about used (nbytes > 1) again to
distinguish the multi bytes wide character punctuation? It at least ensure the
Chinese multi bytes punctuations are handled. The other punctuation will be
handle just like the non-i18n version.
Thanks
Pearly
Index: intl.h
===================================================================
--- intl.h (revision 158214)
+++ intl.h (working copy)
@@ -61,6 +61,8 @@ extern const char *fake_ngettext(const c
#endif
extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+ unsigned int);
extern const char *open_quote;
extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c (revision 158214)
+++ intl.c (working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
#if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
#include <wchar.h>
+#include <wctype.h>
/* Returns the width in columns of MSGSTR, which came from gettext.
This is for indenting subsequent output. */
@@ -106,6 +107,103 @@ gcc_gettext_width (const char *msgstr)
return wcswidth (wmsgstr, nwcs);
}
+/* Look for a suitable point at which to break string MSGSTR. If there is one
+ after at most ROOM columns, take the latest such point. Otherwise,
+ take the first such point in the string, or the end of the string if it
+ cannot be broken.
+ REMAINING is the number of bytes of MSGSTR.
+ Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+ unsigned int remaining)
+{
+ size_t nbytes;
+ wchar_t wc[1];
+ wchar_t pre_wc;
+
+ /* The byte index of the string when convert to a wide character. */
+ unsigned int i = 0;
+
+ /* The number of column positions for the printed wide-character string. */
+ unsigned int colwidth = 0;
+
+ /* The number of bytes that may be printed at the possible line-breaking
+ position found so far, or REMAINING if no suitable break has been found. */
+ unsigned int len_to_print = remaining;
+
+ /* The previous number of bytes that can be printed at the possible
+ line-breaking position found so far, or REMAINING if no suitable
+ break has been found. */
+ unsigned int pre_len_to_print = remaining;
+
+ mbstate_t state;
+ memset (&state, '\0', sizeof (state));
+
+ /* Convert one character at a time, keep track of the width at each point
+ and of the byte count to the last possible point for a line break. The
+ best candidate point found so far to break MSGSTR is after the space,
+ '-', '/', the multi bytes puncts or the multi bytes alphabetic.
+ If we are outside ROOM columns after LEN_TO_PRINT and have found
+ somewhere PRE_LEN_TO_PRINT to break the string, PRE_LEN_TO_PRINT is the
+ best point available; otherwise, the string will be break after
+ LEN_TO_PRINT. */
+ do
+ {
+ nbytes = mbrtowc (wc, msgstr, remaining - i, &state);
+ if (nbytes >= (size_t) -2)
+ /* Invalid input string. */
+ break;
+
+ i += nbytes;
+ colwidth += wcwidth (wc[0]);
+
+ /* Compute the point to break the string. */
+ if (i == remaining)
+ {
+ pre_len_to_print = len_to_print;
+ len_to_print = remaining;
+ }
+ else if (iswspace (wc[0]))
+ pre_len_to_print = len_to_print = i - nbytes;
+ else if ((wc[0] == L'-' || wc[0] == L'/')
+ && i > nbytes
+ && iswalpha (pre_wc))
+ pre_len_to_print = len_to_print = i;
+ else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)
+ {
+ size_t nbytes_next;
+ pre_len_to_print = len_to_print;
+ len_to_print = i;
+ pre_wc = wc[0];
+ nbytes_next = mbrtowc (wc, msgstr + nbytes, remaining - i, &state);
+ if (nbytes_next >= (size_t) -2)
+ break;
+ /* Keep the multi bytes puncts printed with the previous word. */
+ if (iswpunct (wc[0]) && iswalpha (pre_wc) && nbytes_next > 1)
+ {
+ i += nbytes_next;
+ colwidth += wcwidth (wc[0]);
+ len_to_print = i;
+ msgstr += nbytes_next;
+ }
+ }
+
+ if (colwidth >= room)
+ {
+ /* If the string is outside ROOM columns after the possible
+ LEN_TO_PRINT, the break point will be set to the previous point. */
+ if (colwidth > room)
+ len_to_print = pre_len_to_print;
+ break;
+ }
+
+ msgstr += nbytes;
+ pre_wc = wc[0];
+ }
+ while (i < remaining);
+ return len_to_print;
+}
+
#else /* no wcswidth */
/* We don't have any way of knowing how wide the string is. Guess
@@ -119,9 +217,7 @@ gcc_gettext_width (const char *msgstr)
#endif
-#endif /* ENABLE_NLS */
-
-#ifndef ENABLE_NLS
+#else /* Not defined ENABLE_NLS */
const char *
fake_ngettext (const char *singular, const char *plural, unsigned long n)
@@ -132,6 +228,35 @@ fake_ngettext (const char *singular, con
return plural;
}
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \
+ && defined HAVE_WORKING_MBSTOWCS \
+ && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM.
+ REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+ unsigned int remaining)
+{
+ unsigned int i;
+ unsigned int len = remaining;
+ for (i = 0; msgstr[i]; i++)
+ {
+ if (i >= room && len != remaining)
+ break;
+ if (msgstr[i] == ' ')
+ len = i;
+ else if ((msgstr[i] == '-' || msgstr[i] == '/')
+ && msgstr[i + 1] != ' '
+ && i > 0 && ISALPHA (msgstr[i - 1]))
+ len = i + 1;
+ }
+ return len;
+}
+
#endif
/* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c (revision 158214)
+++ opts.c (working copy)
@@ -1143,45 +1143,45 @@ decode_options (unsigned int argc, const
#define LEFT_COLUMN 27
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
- followed by word-wrapped HELP in a second column. */
+/* Output ITEM in the left column and HELP in the second column with the total
+ COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+ corresponding number of spaces for alignment. If the ROOM is less than
+ REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+ take the end of the string.
+ If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+ the left column from the secode line. */
static void
wrap_help (const char *help,
const char *item,
- unsigned int item_width,
unsigned int columns)
{
- unsigned int col_width = LEFT_COLUMN;
unsigned int remaining, room, len;
+ unsigned int col_width = LEFT_COLUMN;
+ unsigned int item_width = gcc_gettext_width (item);
remaining = strlen (help);
do
{
+ int n_spaces = col_width - item_width;
+ printf (" %s", item);
+ if (n_spaces > 0)
+ {
+ const char *space = " ";
+ printf ("%*s", n_spaces, space);
+ }
+
room = columns - 3 - MAX (col_width, item_width);
if (room > columns)
room = 0;
len = remaining;
if (room < len)
- {
- unsigned int i;
-
- for (i = 0; help[i]; i++)
- {
- if (i >= room && len != remaining)
- break;
- if (help[i] == ' ')
- len = i;
- else if ((help[i] == '-' || help[i] == '/')
- && help[i + 1] != ' '
- && i > 0 && ISALPHA (help[i - 1]))
- len = i + 1;
- }
- }
+ len = get_aligned_len (help, room, remaining);
- printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help);
- item_width = 0;
+ printf ( " %.*s\n", len, help);
+ item = " ";
+ item_width = 1;
while (help[len] == ' ')
len++;
help += len;
@@ -1220,7 +1220,7 @@ print_filtered_help (unsigned int includ
/* Get the translation. */
help = _(help);
- wrap_help (help, param, strlen (param), columns);
+ wrap_help (help, param, columns);
}
putchar ('\n');
return;
@@ -1236,6 +1236,7 @@ print_filtered_help (unsigned int includ
unsigned int len;
const char *opt;
const char *tab;
+ char *buf;
if (include_flags == 0
|| ((option->flags & include_flags) != include_flags))
@@ -1272,7 +1273,10 @@ print_filtered_help (unsigned int includ
if (tab)
{
len = tab - help;
- opt = help;
+ buf = (char *) alloca (len + 1);
+ strncpy (buf, help, len);
+ buf[len] = '\0';
+ opt = buf;
help = tab + 1;
}
else
@@ -1313,7 +1317,7 @@ print_filtered_help (unsigned int includ
help = new_help;
}
- wrap_help (help, opt, len, columns);
+ wrap_help (help, opt, columns);
displayed = true;
}