This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR/42686] Align the help text output
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Shujing Zhao <pearly dot zhao at oracle dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Paolo Carlini <paolo dot carlini at oracle dot com>
- Date: Tue, 6 Apr 2010 16:55:51 +0000 (UTC)
- Subject: Re: [PATCH PR/42686] Align the help text output
- References: <4B863559.4000407@oracle.com> <Pine.LNX.4.64.1002251547410.30254@digraph.polyomino.org.uk> <4B976D51.8010705@oracle.com> <Pine.LNX.4.64.1003101656110.12539@digraph.polyomino.org.uk> <4B9A1E65.4080506@oracle.com> <Pine.LNX.4.64.1003121651160.31790@digraph.polyomino.org.uk> <4B9DFA37.9010303@oracle.com> <Pine.LNX.4.64.1003151052520.31811@digraph.polyomino.org.uk> <4BA07D89.3040406@oracle.com> <Pine.LNX.4.64.1003171201450.2148@digraph.polyomino.org.uk> <4BA2010C.9030103@oracle.com> <Pine.LNX.4.64.1003181650570.19807@digraph.polyomino.org.uk> <4BA35936.8070304@oracle.com> <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk> <4BA72D29.2080406@oracle.com> <4BB2E45A.30909@oracle.com>
On Wed, 31 Mar 2010, Shujing Zhao wrote:
> + unsigned int colwidth = 0;
No comment on this variable.
> + /* The number of bytes that have been found can be printed at ROOM columns. */
> + unsigned int pre_len_to_print;
The semantics of this variable seem confusing to me. There isn't a clear
difference between "have been found can be printed" and "can be printed".
> + /* The possible number of bytes that can be printed at ROOM columns. */
> + unsigned int cur_len_to_print = 0;
I think you mean something like "The number of bytes that can be printed
at the best possible line-breaking position found so far, or 0 if no
suitable break has been found.". Or is that pre_len_to_print?
> + /* The number of bytes that MSGSTR left to convert to wide character. */
Conversion to wide characters is part of the implementation of this loop,
not its interface. "The number of bytes of MSGSTR not yet examined for
possible line breaks."?
> + /* Convert one character at a time, keep track of the width at each point
> + and of the byte count to the last possible point for a line break. The
> + best candidate point found so far to break MSGSTR is after the space,
> + some puncts or multibyte character with the number of bytes
> + CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere
Do you mean "is after CUR_LEN_TO_PRINT bytes of the string"? Saying "with
the number of bytes" sounds like counting the bytes in a single character,
rather than being a position within the string.
> + PRE_LEN_TO_PRINT to break the string, that is the best point available;
Previously you said CUR_LEN_TO_PRINT is the best point so far. Now you're
saying PRE_LEN_TO_PRINT is best. Which is?
> + otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
> + is a valid point to break. */
Why wouldn't it be valid, if it was previously the best valid point?
Perhaps you need to distinguish clearly whether a variable's value
represents what is known to be a valid point for breaking the string, or
might be one subject to further tests. "cur_" and "pre_" do not seem
adequate to represent that distinction. The variable for a value still
subject to further tests (if needed at all) should be local to a single
iteration of the loop.
> + while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
> + {
> + if (nbytes >= (size_t) -2)
> + /* Invalide input string. */
"Invalid".
> + /* Compute the point to break the string. */
> + if (left_len == 0)
> + {
> + pre_len_to_print = cur_len_to_print;
> + cur_len_to_print = remaining;
I think it might be clearer to have the loop in the form
while (left_len > 0)
which at least would avoid questions of what mbrtowc is meant to do when
given a length of 0. After the loop, if cur_len_to_print is 0 or the
entire string would fit in the given number of columns, set
cur_len_to_print accordingly to indicate that the whole string should be
printed. Then make the loop mimic that in the non-i18n version of the
function as much as possible: if the previous columns occupy at least the
full width and a point to break has been found, then break; otherwise
decode a character; if a space then breaking *before* the space is OK; if
'-' or '/' (and the other conditions apply) then breaking *after* that
character is OK.
Incidentally, it would be a lot clearer if you had a single variable for
the byte index into the string (like the non-i18n version) instead of both
increasing msgstr and decreasing left_len - so the loop would be "while (i
< remaining)". You wouldn't need to do "remaining - left_len"
computations that way.
> + }
> + else if (iswspace (wc[0]))
> + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
> + else if ((wc[0] == L'-' || wc[0] == L'/')
> + && iswalpha (pre_wc))
> + pre_len_to_print = cur_len_to_print = remaining - left_len;
Will pre_wc always be initialized here, if the string starts with '-' or
'/'?
> + else if (nbytes > 1)
I still see no reason you should need conditionals on the number of bytes
in a character, instead of always working with characters regardless of
the number of bytes in them.
> + {
> + pre_len_to_print = cur_len_to_print;
> + cur_len_to_print = remaining - left_len;
> + pre_wc = wc[0];
> + pre_nbytes = nbytes;
> + /* Keep the puncts printed with the previous word. */
I don't understand what issue you are dealing with here, but punctuation
other than '/' or '-' isn't considered a valid line-breaking point anyway
so you shouldn't need special code for it.
--
Joseph S. Myers
joseph@codesourcery.com