[PATCH] Add a character size parameter to c_strlen/get_range_strlen

Mon Aug 20 23:23:00 GMT 2018

On 08/20/2018 04:17 PM, Jeff Law wrote:
> On 08/18/2018 11:38 AM, Martin Sebor wrote:
>> On 08/17/2018 09:32 PM, Jeff Law wrote:
>>> On 08/17/2018 02:17 PM, Martin Sebor wrote:
>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>>
>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>>
>>>>>>>>>> restores previous behavior.  The sprintf bits want to count
>>>>>>>>>> element
>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>>
>>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier ==
>>>>>>>>>> FMT_LEN_l ? 4 : 1;
>>>>>>>>>
>>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to
>>>>>>>>> example
>>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it
>>>>>>>> was
>>>>>>>> worth handling the 2 byte wchar case.
>>>>>>
>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>>> was useful to pass a size of the character type to the function.
>>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>>
>>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>>> element size given that the patch causes c_strlen() fail when
>>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>>> asked for the number of words in a char array whose size was
>>>>>> not evenly divisible by wordsize.)
>>>>>>
>>>>>
>>>>> I think in this case c_strlen should use the type which the %S format
>>>>> uses at runtime, otherwise it will not have anything to do with
>>>>> the reality.
>>>>
>>>> %S is not what I'm talking about.
>>>>
>>>> Failing in the case I described (a caller asking for the size
>>>> in bytes of a constant object whose type is larger) prevents
>>>> callers that don't care about the type from detecting the actual
>>>> size of a constant.
>>>>
>>>> Specifically for sprintf, it means that the buffer overflow
>>>> below is no longer diagnosed:
>>>>
>>>>   struct S { char a[2]; void (*pf)(void); };
>>>>
>>>>   void f (struct S *p)
>>>>   {
>>>>     const char *q = (char*)L"\x41424344\x45464748";
>>>>
>>>>     sprintf (p->a, "%s", q);
>>>>   }
>>> I don't think this is in the testsuite, is it?  I verified that there
>>> was no regressions when I installed Bernd's patch and when I installed
>>> yours.
>>
>> No, there are very few tests that exercise these kinds of mixed
>> argument types.  Code like that is most likely the result of
>> a mistake, but it's not the kind of a bug I had even thought
>> about until some of the codegen issues with mixed argument types
>> were brought up (PR 86711/86714).
> Phew.  I was worried I'd somehow missed the failure or tested the wrong
> patch or who knows what.
>
> Can you add that test, xfailed for now to the testsuite?

Done in r263676.

>> I would just like the ability to get the length/size somehow
>> so that the questionable code that started us down this path
>> can be detected.  This is not just the sprintf(d, "%s", L"1")
>> kind of a mismatch but also the missing nul detection in cases
>> like:
> [ ... ]
> I know.  And ideally we'll be able to handle everything you want to.
> But we also have to realize that sometimes we may have to punt.
>
> Also remember that incremental progress is (usually) good.
>
>
>>
>>   const wchar_t a[1] = L"\xffffffff";
>>   strcpy(d, (char*)a);
> This touches on both the representational issue (excess elements in the
> initializer) and how to handle a non-terminated string.  Both are issues
> we're debating right now.
>
>>
>> I don't think this is necessarily an important use case to
>> optimize for but it is one that GCC optimizes already nonetheless,
>> and always has.  For example, this has always been folded into 4
>> by GCC and still is even with the patch:
>>
>>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>
>>   int f (void)
>>   {
>>     const char *p = (char*)wc;
>>     return strlen (p);            // folded to 4
>>   }
>>
>> It's optimized because fold_const_call() relies on c_getstr()
>> which returns the underlying byte representation of the wide
>> string, and despite c_strlen() now trying to prevent it.
> And I think you're hitting on some of issues already raised in the thread.
>
> In this specific case though ISTM 4 is the right answer.  We casted to a
> char *, so that's what we should be counting.  Or am I missing
> something?  Also note that's the value we'd get from the strlen C
> library call IIUC.

It is the right answer.  My point is that this is optimized
but the change to c_strlen() prevents the same optimization
in other similar cases.  For example, GCC 6 folds this into
memcpy:

   __builtin_strcpy (d, (char*)L"\x12345678");

GCC 7 and 8 do too but get the byte count wrong (my bad).

Current trunk doesn't optimize it.  If restoring the original
behavior is the intent (and not just fixing the counting bug)
then c_strlen() should be fixed to fold this again.

(I'm not a fan of the strcpy to memcpy transformation because
it makes other things difficult but that's beside the point.)

>> The missing nul variant of the same test case isn't folded (it
>> ends up calling the library strlen) but the bug cannot be
>> detected using the modified c_strlen():
>>
>>   const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
>>
>>   int f (void)
>>   {
>>     const char *p = (char*)wc;
>>     return strlen (p);            // not folded
>>   }
>>
>> To detect it, I'd have to use some other function, like
>> c_getstr().  I could certainly do that but I don't think
>> I should need to.
> And that's on the list of things we're going to try and address next.  I
> don't think it needs to be conflated with change which indicated to
> c_strlen how to count.

The two are one and the same: if c_strlen() doesn't count bytes
in arrays of wide characters (wchar_t, char16_t, or char32_t)
then the original GCC behavior is not restored and the missing
nul detection won't work for these "mixed type" cases.

> I do hope you're getting all these testcases recorded.  I'd even support
> installing them into the suite now, properly xfailed.  It's easier to
> see how any patch under consideration affects the wider set of tests.

I haven't been recording any of them --  I have no idea where
this is going.  As I've said, these patches prevent some of
the work I have already submitted.  That's why I have been
objecting to them, including adding the ELTSIZE argument to
c_strlen().

Martin