This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

On 08/21/18 01:23, Martin Sebor wrote:
> On 08/20/2018 04:17 PM, Jeff Law wrote:
>> On 08/18/2018 11:38 AM, Martin Sebor wrote:
>>> On 08/17/2018 09:32 PM, Jeff Law wrote:
>>>> On 08/17/2018 02:17 PM, Martin Sebor wrote:
>>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>>>> restores previous behavior.  The sprintf bits want to count
>>>>>>>>>>> element
>>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier ==
>>>>>>>>>>> FMT_LEN_l ? 4 : 1;
>>>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to
>>>>>>>>>> example
>>>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it
>>>>>>>>> was
>>>>>>>>> worth handling the 2 byte wchar case.
>>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>>>> was useful to pass a size of the character type to the function.
>>>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>>>> element size given that the patch causes c_strlen() fail when
>>>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>>>> asked for the number of words in a char array whose size was
>>>>>>> not evenly divisible by wordsize.)
>>>>>> I think in this case c_strlen should use the type which the %S format
>>>>>> uses at runtime, otherwise it will not have anything to do with
>>>>>> the reality.
>>>>> %S is not what I'm talking about.
>>>>> Failing in the case I described (a caller asking for the size
>>>>> in bytes of a constant object whose type is larger) prevents
>>>>> callers that don't care about the type from detecting the actual
>>>>> size of a constant.
>>>>> Specifically for sprintf, it means that the buffer overflow
>>>>> below is no longer diagnosed:
>>>>>   struct S { char a[2]; void (*pf)(void); };
>>>>>   void f (struct S *p)
>>>>>   {
>>>>>     const char *q = (char*)L"\x41424344\x45464748";
>>>>>     sprintf (p->a, "%s", q);
>>>>>   }
>>>> I don't think this is in the testsuite, is it?  I verified that there
>>>> was no regressions when I installed Bernd's patch and when I installed
>>>> yours.
>>> No, there are very few tests that exercise these kinds of mixed
>>> argument types.  Code like that is most likely the result of
>>> a mistake, but it's not the kind of a bug I had even thought
>>> about until some of the codegen issues with mixed argument types
>>> were brought up (PR 86711/86714).
>> Phew.  I was worried I'd somehow missed the failure or tested the wrong
>> patch or who knows what.
>> Can you add that test, xfailed for now to the testsuite?
> Done in r263676.
>>> I would just like the ability to get the length/size somehow
>>> so that the questionable code that started us down this path
>>> can be detected.  This is not just the sprintf(d, "%s", L"1")
>>> kind of a mismatch but also the missing nul detection in cases
>>> like:
>> [ ... ]
>> I know.  And ideally we'll be able to handle everything you want to.
>> But we also have to realize that sometimes we may have to punt.
>> Also remember that incremental progress is (usually) good.
>>>   const wchar_t a[1] = L"\xffffffff";
>>>   strcpy(d, (char*)a);
>> This touches on both the representational issue (excess elements in the
>> initializer) and how to handle a non-terminated string.  Both are issues
>> we're debating right now.
>>> I don't think this is necessarily an important use case to
>>> optimize for but it is one that GCC optimizes already nonetheless,
>>> and always has.  For example, this has always been folded into 4
>>> by GCC and still is even with the patch:
>>>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>>   int f (void)
>>>   {
>>>     const char *p = (char*)wc;
>>>     return strlen (p);            // folded to 4
>>>   }
>>> It's optimized because fold_const_call() relies on c_getstr()
>>> which returns the underlying byte representation of the wide
>>> string, and despite c_strlen() now trying to prevent it.
>> And I think you're hitting on some of issues already raised in the thread.
>> In this specific case though ISTM 4 is the right answer.  We casted to a
>> char *, so that's what we should be counting.  Or am I missing
>> something?  Also note that's the value we'd get from the strlen C
>> library call IIUC.
> It is the right answer.  My point is that this is optimized
> but the change to c_strlen() prevents the same optimization
> in other similar cases.  For example, GCC 6 folds this into
> memcpy:
>    __builtin_strcpy (d, (char*)L"\x12345678");
> GCC 7 and 8 do too but get the byte count wrong (my bad).
> Current trunk doesn't optimize it.  If restoring the original
> behavior is the intent (and not just fixing the counting bug)
> then c_strlen() should be fixed to fold this again.
> (I'm not a fan of the strcpy to memcpy transformation because
> it makes other things difficult but that's beside the point.)

Yes, this is a good example why it is not good to fold
stuff that is invalid:

  __builtin_strcpy(d, (char*)L"\x12345678");
  __builtin_memcpy(d, (char*)L"\x12345678", 5);

because it folds away the invalidness, makes invalid things
valid, and prevents later passes to spot the bug.

As a matter of common sense, we should not fold invalid stuff
to valid stuff, even if that is historic behavior.


>>> The missing nul variant of the same test case isn't folded (it
>>> ends up calling the library strlen) but the bug cannot be
>>> detected using the modified c_strlen():
>>>   const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
>>>   int f (void)
>>>   {
>>>     const char *p = (char*)wc;
>>>     return strlen (p);            // not folded
>>>   }
>>> To detect it, I'd have to use some other function, like
>>> c_getstr().  I could certainly do that but I don't think
>>> I should need to.
>> And that's on the list of things we're going to try and address next.  I
>> don't think it needs to be conflated with change which indicated to
>> c_strlen how to count.
> The two are one and the same: if c_strlen() doesn't count bytes
> in arrays of wide characters (wchar_t, char16_t, or char32_t)
> then the original GCC behavior is not restored and the missing
> nul detection won't work for these "mixed type" cases.
>> I do hope you're getting all these testcases recorded.  I'd even support
>> installing them into the suite now, properly xfailed.  It's easier to
>> see how any patch under consideration affects the wider set of tests.
> I haven't been recording any of them --  I have no idea where
> this is going.  As I've said, these patches prevent some of
> the work I have already submitted.  That's why I have been
> objecting to them, including adding the ELTSIZE argument to
> c_strlen().
> Martin

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]