This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR/42686] Align the help text output

On Wed, 31 Mar 2010, Shujing Zhao wrote:

> +  unsigned int colwidth = 0;

No comment on this variable.

> +  /* The number of bytes that have been found can be printed at ROOM columns. */
> +  unsigned int pre_len_to_print;

The semantics of this variable seem confusing to me.  There isn't a clear 
difference between "have been found can be printed" and "can be printed".

> +  /* The possible number of bytes that can be printed at ROOM columns. */
> +  unsigned int cur_len_to_print = 0;

I think you mean something like "The number of bytes that can be printed 
at the best possible line-breaking position found so far, or 0 if no 
suitable break has been found.".  Or is that pre_len_to_print?

> +  /* The number of bytes that MSGSTR left to convert to wide character. */

Conversion to wide characters is part of the implementation of this loop, 
not its interface.  "The number of bytes of MSGSTR not yet examined for 
possible line breaks."?

> +  /* Convert one character at a time, keep track of the width at each point 
> +     and of the byte count to the last possible point for a line break. The 
> +     best candidate point found so far to break MSGSTR is after the space,
> +     some puncts or multibyte character with the number of bytes
> +     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere

Do you mean "is after CUR_LEN_TO_PRINT bytes of the string"?  Saying "with 
the number of bytes" sounds like counting the bytes in a single character, 
rather than being a position within the string.

> +     PRE_LEN_TO_PRINT to break the string, that is the best point available;

Previously you said CUR_LEN_TO_PRINT is the best point so far.  Now you're 
saying PRE_LEN_TO_PRINT is best.  Which is?

> +     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
> +     is a valid point to break. */ 

Why wouldn't it be valid, if it was previously the best valid point?

Perhaps you need to distinguish clearly whether a variable's value 
represents what is known to be a valid point for breaking the string, or 
might be one subject to further tests.  "cur_" and "pre_" do not seem 
adequate to represent that distinction.  The variable for a value still 
subject to further tests (if needed at all) should be local to a single 
iteration of the loop.

> +  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
> +    {
> +      if (nbytes >= (size_t) -2)
> +	/* Invalide input string. */


> +      /* Compute the point to break the string. */
> +      if (left_len == 0)
> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining;

I think it might be clearer to have the loop in the form

  while (left_len > 0)

which at least would avoid questions of what mbrtowc is meant to do when 
given a length of 0.  After the loop, if cur_len_to_print is 0 or the 
entire string would fit in the given number of columns, set 
cur_len_to_print accordingly to indicate that the whole string should be 
printed.  Then make the loop mimic that in the non-i18n version of the 
function as much as possible: if the previous columns occupy at least the 
full width and a point to break has been found, then break; otherwise 
decode a character; if a space then breaking *before* the space is OK; if 
'-' or '/' (and the other conditions apply) then breaking *after* that 
character is OK.

Incidentally, it would be a lot clearer if you had a single variable for 
the byte index into the string (like the non-i18n version) instead of both 
increasing msgstr and decreasing left_len - so the loop would be "while (i 
< remaining)".  You wouldn't need to do "remaining - left_len" 
computations that way.

> +	}
> +      else if (iswspace (wc[0]))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
> +      else  if ((wc[0] == L'-' || wc[0] == L'/')
> +	        && iswalpha (pre_wc))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len;

Will pre_wc always be initialized here, if the string starts with '-' or 

> +      else if (nbytes > 1)

I still see no reason you should need conditionals on the number of bytes 
in a character, instead of always working with characters regardless of 
the number of bytes in them.

> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining - left_len;
> +	  pre_wc = wc[0];
> +	  pre_nbytes = nbytes;
> +	  /* Keep the puncts printed with the previous word. */

I don't understand what issue you are dealing with here, but punctuation 
other than '/' or '-' isn't considered a valid line-breaking point anyway 
so you shouldn't need special code for it.

Joseph S. Myers

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]