This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

On 07/19/18 22:03, Martin Sebor wrote:
> On 07/19/2018 07:23 AM, Bernd Edlinger wrote:
>>> @@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
>>>      return ssize_int (0);
>>>        /* We don't know the starting offset, but we do know that the string
>>> -     has no internal zero bytes.  We can assume that the offset falls
>>> -     within the bounds of the string; otherwise, the programmer deserves
>>> -     what he gets.  Subtract the offset from the length of the string,
>>> -     and return that.  This would perhaps not be valid if we were dealing
>>> -     with named arrays in addition to literal string constants.  */
>>> -      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
>>> +     has no internal zero bytes.  If the offset falls within the bounds
>>> +     of the string subtract the offset from the length of the string,
>>> +     and return that.  Otherwise the length is zero.  Take care to
>>> +     use SAVE_EXPR in case the OFFSET has side-effects.  */
>>> +      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
>>> +      offsave = fold_convert (ssizetype, offsave);
>>> +      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>> +                      build_int_cst (ssizetype, len * eltsize));
>>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>>> +      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>> +                  build_zero_cst (ssizetype));
>> This computes the number of bytes.
>> c_strlen is supposed to return number of (wide) characters:
> You're right.  I guess that must mean the function isn't used
> for wide character strings.  The original code also computes
> a byte offset and so does the new expression.  I have no
> problem changing it, but if feels like a change that should
> be made independently of this bug fix.
>> /* Compute the length of a null-terminated character string or wide
>>     character string handling character sizes of 1, 2, and 4 bytes.
>>     TREE_STRING_LENGTH is not the right way because it evaluates to
>>     the size of the character array in bytes (as opposed to characters)
>>     and because it can contain a zero byte in the middle.
>>> @@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
>>>      {
>>>        if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
>>>      return NULL_TREE;
>>> -      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
>>> -    {
>>> -      /* Add the scaled variable index to the constant offset.  */
>>> -      tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
>>> -                     fold_convert (sizetype, varidx),
>>> -                     eltsize);
>>> -      offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
>>> -    }
>>> -      else
>>> -    return NULL_TREE;
>>> +
>>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>>> +
>>> +      /* Set the non-constant offset to the non-constant index scaled
>>> +     by the size of the character type.  */
>>> +      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
>>> +                fold_convert (sizetype, varidx),
>>> +                TYPE_SIZE_UNIT (chartype));
>> here you fix the computation for wide character strings,
>> but I see no test cases with wide character stings.
> The change above corrects the offset when eltsize refers to
> the size of an array rather than its character elements.
> It wasn't made to fix how wide strings are handled.  I don't
> know if the function is ever called for them in a way that
> matters (there are no wide character built-ins).  But it's
> something to look into.

Now I am suprised.
your loop above locates always an INTEGER_TYPE,
in the later version it skips all ARRAY_TYPES,
So TYPE_SIZE_UNIT is never the size of an array,
it is always the element type (char, wchar_t, int *)
that the array is made of.
So you compute offset = varidx * sizeof(char)
but sizeof (char) is 1, so I thought naturally
you are concerned about sizeof(wchar_t).

And indeed I can write strlen((char*) & L"test"[i]).

>> But down here you use a non-wide character function on a
>> wide character string:
>>    /* Avoid returning a string that doesn't fit in the array
>>       it is stored in, like
>>       const char a[4] = "abcde";
>>       but do handle those that fit even if they have excess
>>       initializers, such as in
>>       const char a[4] = "abc\000\000";
>>       The excess elements contribute to TREE_STRING_LENGTH()
>>       but not to strlen().  */
>>    unsigned HOST_WIDE_INT length
>>      = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
>> Actually I begin to wonder, if all this wide character stuff is
>> really so common that we have to optimize it.
> I've replaced the strnlen() call with string_length() in
> the updated patch.  The code is entered for wchar_t (and
> other) strings but since there are no wide character
> built-ns in GCC I suspect that optimizations that apply
> to them that don't also apply to other arrays are few
> and far between.
> At the same time, as wide characters (beyond wchar_t) are
> increasingly becoming used in software I wouldn't want to
> preclude future optimizations from making use of those that
> benefit narrow strings.  In any event, that too is
> a discussion to have independently of this bug fix.

No doubt that they exist.  But I would personally prefer
to implement that step by step, and it would absolutely
acceptable to bail out on wide char arguments for now.

But if they are implemented, then there has to be some
test coverage as well.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]