This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
To: Martin Sebor <msebor at gmail dot com>, Joseph Myers <joseph at codesourcery dot com>, Jeff Law <law at redhat dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>, Richard Biener <rguenther at suse dot de>
Date: Mon, 20 Aug 2018 23:04:45 +0000
Subject: Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
References: <AM5PR0701MB2657793BFB16DAA96D7806C8E4380@AM5PR0701MB2657.eurprd07.prod.outlook.com> <5542dc0d-e229-2d0a-168f-6cc02fd7eda2@redhat.com> <alpine.DEB.2.20.1808162259100.11323@digraph.polyomino.org.uk> <c255f589-c72f-bc6b-804b-8882c2bb3f3f@redhat.com> <alpine.DEB.2.20.1808171209440.2229@digraph.polyomino.org.uk> <a70e6893-3172-b93e-145a-cc03d5daf4f8@gmail.com> <AM5PR0701MB2657F07232CE40E7015EAE44E43D0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <8a038e38-8632-3cd4-f178-9717c471a5ef@gmail.com> <DB6PR0701MB26645EEC0913740D3786B83AE43D0@DB6PR0701MB2664.eurprd07.prod.outlook.com> <af0c3124-c77a-4682-1c6d-1f5ef21c09c0@gmail.com>

On 08/18/18 20:01, Martin Sebor wrote:
> On 08/17/2018 05:01 PM, Bernd Edlinger wrote:
>> On 08/17/18 22:17, Martin Sebor wrote:
>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>
>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>
>>>>>>>>> restores previous behavior.  The sprintf bits want to count element
>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>
>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier == FMT_LEN_l ? 4 : 1;
>>>>>>>>
>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to example
>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it was
>>>>>>> worth handling the 2 byte wchar case.
>>>>>
>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>> was useful to pass a size of the character type to the function.
>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>
>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>> element size given that the patch causes c_strlen() fail when
>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>> asked for the number of words in a char array whose size was
>>>>> not evenly divisible by wordsize.)
>>>>>
>>>>
>>>> I think in this case c_strlen should use the type which the %S format
>>>> uses at runtime, otherwise it will not have anything to do with
>>>> the reality.
>>>
>>> %S is not what I'm talking about.
>>>
>>> Failing in the case I described (a caller asking for the size
>>> in bytes of a constant object whose type is larger) prevents
>>> callers that don't care about the type from detecting the actual
>>> size of a constant.
>>>
>>> Specifically for sprintf, it means that the buffer overflow
>>> below is no longer diagnosed:
>>>
>>>    struct S { char a[2]; void (*pf)(void); };
>>>
>>>    void f (struct S *p)
>>>    {
>>>      const char *q = (char*)L"\x41424344\x45464748";
>>>
>>>      sprintf (p->a, "%s", q);
>>>    }
>>>
>>> There may be no other analyses that would benefit from this
>>> ability today but there easily could be.  There certainly
>>> are optimizations that depend on c_getstr() returning
>>> a pointer to the constant object regardless of its type
>>> (memchr being one of them).
>>>
>>
>> Yes, I agree.
>>
>> Coincidentally that is exactly what in my follow-up patch implements.
>> See: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01005.html
>>
>> If you call c_getstr(x) you get a valid zero-terminated single-byte
>> string or nothing.
>>
>> If you call c_getstr(x, &memsize) you get a pointer to a memory
>> of the specified memsize, regardless of the underlying type.
>> and whether or not zero terminated.
> 
> Most functions in GCC call c_strlen() at some point to determine
> the length of a string.  They need to be able to detect the missing
> nul -- it would double the amount of processing to have them also
> call c_getstr() when c_strlen() fails to see if the failure happens
> to be due to a missing nul -- it the overwhelming majority of cases
> it won't be.
> 

Well, yes, if the c_strlen function fails, it would be cheap to have some
error code for the failure reason.

But that is something for later.


Bernd.

References:
- [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Bernd Edlinger
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Jeff Law
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Joseph Myers
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Jeff Law
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Joseph Myers
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Martin Sebor
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Bernd Edlinger
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Martin Sebor
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Bernd Edlinger
- Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen
  - From: Martin Sebor

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]