This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH PR/42686] Align the help text output

From: "Joseph S. Myers" <joseph at codesourcery dot com>
To: Shujing Zhao <pearly dot zhao at oracle dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Paolo Carlini <paolo dot carlini at oracle dot com>
Date: Tue, 6 Apr 2010 16:55:51 +0000 (UTC)
Subject: Re: [PATCH PR/42686] Align the help text output
References: <4B863559.4000407@oracle.com> <Pine.LNX.4.64.1002251547410.30254@digraph.polyomino.org.uk> <4B976D51.8010705@oracle.com> <Pine.LNX.4.64.1003101656110.12539@digraph.polyomino.org.uk> <4B9A1E65.4080506@oracle.com> <Pine.LNX.4.64.1003121651160.31790@digraph.polyomino.org.uk> <4B9DFA37.9010303@oracle.com> <Pine.LNX.4.64.1003151052520.31811@digraph.polyomino.org.uk> <4BA07D89.3040406@oracle.com> <Pine.LNX.4.64.1003171201450.2148@digraph.polyomino.org.uk> <4BA2010C.9030103@oracle.com> <Pine.LNX.4.64.1003181650570.19807@digraph.polyomino.org.uk> <4BA35936.8070304@oracle.com> <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk> <4BA72D29.2080406@oracle.com> <4BB2E45A.30909@oracle.com>

On Wed, 31 Mar 2010, Shujing Zhao wrote:

> +  unsigned int colwidth = 0;

No comment on this variable.

> +  /* The number of bytes that have been found can be printed at ROOM columns. */
> +  unsigned int pre_len_to_print;

The semantics of this variable seem confusing to me.  There isn't a clear 
difference between "have been found can be printed" and "can be printed".

> +  /* The possible number of bytes that can be printed at ROOM columns. */
> +  unsigned int cur_len_to_print = 0;

I think you mean something like "The number of bytes that can be printed 
at the best possible line-breaking position found so far, or 0 if no 
suitable break has been found.".  Or is that pre_len_to_print?

> +  /* The number of bytes that MSGSTR left to convert to wide character. */

Conversion to wide characters is part of the implementation of this loop, 
not its interface.  "The number of bytes of MSGSTR not yet examined for 
possible line breaks."?

> +  /* Convert one character at a time, keep track of the width at each point 
> +     and of the byte count to the last possible point for a line break. The 
> +     best candidate point found so far to break MSGSTR is after the space,
> +     some puncts or multibyte character with the number of bytes
> +     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere

Do you mean "is after CUR_LEN_TO_PRINT bytes of the string"?  Saying "with 
the number of bytes" sounds like counting the bytes in a single character, 
rather than being a position within the string.

> +     PRE_LEN_TO_PRINT to break the string, that is the best point available;

Previously you said CUR_LEN_TO_PRINT is the best point so far.  Now you're 
saying PRE_LEN_TO_PRINT is best.  Which is?

> +     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
> +     is a valid point to break. */ 

Why wouldn't it be valid, if it was previously the best valid point?

Perhaps you need to distinguish clearly whether a variable's value 
represents what is known to be a valid point for breaking the string, or 
might be one subject to further tests.  "cur_" and "pre_" do not seem 
adequate to represent that distinction.  The variable for a value still 
subject to further tests (if needed at all) should be local to a single 
iteration of the loop.

> +  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
> +    {
> +      if (nbytes >= (size_t) -2)
> +	/* Invalide input string. */

"Invalid".

> +      /* Compute the point to break the string. */
> +      if (left_len == 0)
> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining;

I think it might be clearer to have the loop in the form

  while (left_len > 0)

which at least would avoid questions of what mbrtowc is meant to do when 
given a length of 0.  After the loop, if cur_len_to_print is 0 or the 
entire string would fit in the given number of columns, set 
cur_len_to_print accordingly to indicate that the whole string should be 
printed.  Then make the loop mimic that in the non-i18n version of the 
function as much as possible: if the previous columns occupy at least the 
full width and a point to break has been found, then break; otherwise 
decode a character; if a space then breaking *before* the space is OK; if 
'-' or '/' (and the other conditions apply) then breaking *after* that 
character is OK.

Incidentally, it would be a lot clearer if you had a single variable for 
the byte index into the string (like the non-i18n version) instead of both 
increasing msgstr and decreasing left_len - so the loop would be "while (i 
< remaining)".  You wouldn't need to do "remaining - left_len" 
computations that way.

> +	}
> +      else if (iswspace (wc[0]))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
> +      else  if ((wc[0] == L'-' || wc[0] == L'/')
> +	        && iswalpha (pre_wc))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len;

Will pre_wc always be initialized here, if the string starts with '-' or 
'/'?

> +      else if (nbytes > 1)

I still see no reason you should need conditionals on the number of bytes 
in a character, instead of always working with characters regardless of 
the number of bytes in them.

> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining - left_len;
> +	  pre_wc = wc[0];
> +	  pre_nbytes = nbytes;
> +	  /* Keep the puncts printed with the previous word. */

I don't understand what issue you are dealing with here, but punctuation 
other than '/' or '-' isn't considered a valid line-breaking point anyway 
so you shouldn't need special code for it.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: [PATCH PR/42686] Align the help text output
  - From: Shujing Zhao

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]