This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Make strlen range computations more conservative
- From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- To: Martin Sebor <msebor at gmail dot com>, Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Biener <rguenther at suse dot de>, Jeff Law <law at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Sun, 5 Aug 2018 06:51:23 +0000
- Subject: Re: [PATCH] Make strlen range computations more conservative
- References: <93caaaa6-d6d1-0d4d-c735-b4d9d5bcce07@gmail.com> <AM5PR0701MB2657A665191962F33739DBECE42A0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <8b0e06a1-eea4-418e-35df-c394766bea10@gmail.com> <20180731063839.GC17988@tucnak> <3d6899a7-4536-253e-e082-819301e6ab38@gmail.com> <20180731154812.GF17988@tucnak> <933a1c4a-8cd0-a538-1e7e-d481b7d6ce80@gmail.com> <alpine.LSU.2.20.1808010915130.16707@zhemvz.fhfr.qr> <20180801084015.GH17988@tucnak> <bd5f43cf-92ac-0038-a493-ad0d9e68debf@gmail.com> <20180803074339.GJ17988@tucnak> <24618943-18a5-4406-9492-c60bd4ec3f08@gmail.com>
On 08/04/18 22:52, Martin Sebor wrote:
> On 08/03/2018 01:43 AM, Jakub Jelinek wrote:
>> On Thu, Aug 02, 2018 at 09:59:13PM -0600, Martin Sebor wrote:
>>>> If I call this with foo (2, 1), do you still claim it is not valid C?
>>>
>>> String functions like strlen operate on character strings stored
>>> in character arrays. Calling strlen (&s[1]) is invalid because
>>> &s[1] is not the address of a character array. The fact that
>>> objects can be represented as arrays of bytes doesn't change
>>> that. The standard may be somewhat loose with words on this
>>> distinction but the intent certainly isn't for strlen to traverse
>>> arbitrary sequences of bytes that cross subobject boundaries.
>>> (That is the intent behind the raw memory functions, but
>>> the current text doesn't make the distinction clear.)
>>
>> But the standard doesn't say that right now.
>
> It does, in the restriction on multi-dimensional array accesses.
> Given the array 'char a[2][2];' it's only valid to access a[0][0]
> and a[0][1], and a[1][0], and a[1][1]. It's not valid to access
> a[2][0] or a[2][1], even though they happen to be located at
> the same addresses as a[1][0] and a[1][1].
>
> There is no exception for distinct struct members. So in
> a struct { char a[2], b[2]; }, even though a and b and laid
> out the same way as char[2][2] would be, it's not valid to
> treat a as such. There is no distinction between array
> subscripting and pointer arithmetic, so it doesn't matter
> what form the access takes.
>
> Yes, the standard could be clearer. There probably even are
> ambiguities and contradictions (the authors of the Object Model
> proposal believe there are and are trying to clarify/remove
> them). But the intent is clearly there. It's especially
> important for adjacent members of different types (say a char[8]
> followed by a function pointer. We definitely don't want writes
> to the array to be allowed to change the function pointer.)
>
>> Plus, at least from the middle-end POV, there is also the case of
>> placement new and stores changing the dynamic type of the object,
>> previously say a struct with two fields, then a placement new with a single
>> char array over it (the placement new will not survive in the middle-end, so
>> it will be just a memcpy or strcpy or some other byte copy over the original
>> object, and due to the CSE/SCCVN etc. of pointer to pointer conversions
>> being in the middle-end useless means you can see a pointer to the struct
>> with two fields rather than pointer to char array.
>
> There may be challenges in the middle-end, you would know much
> better than me. All I'm saying is that it's not valid to access
> [sub]objects by dereferencing pointers to other subobjects. All
> the examples in this discussion have been of that form.
>
These examples do not aim to be valid C, they just point out limitations
of the middle-end design, and a good deal of the problems are due
to trying to do things that are not safe within the boundaries given
by the middle-end design.
Bernd.
>>
>> Consider e.g.
>> typedef __typeof__ (sizeof 0) size_t;
>> void *operator new (size_t, void *p) { return p; }
>> void *operator new[] (size_t, void *p) { return p; }
>> struct S { char a; char b[64]; };
>> void baz (char *);
>>
>> size_t
>> foo (S *p)
>> {
>> baz (&p->a);
>> char *q = new (p) char [16];
>> baz (q);
>> return __builtin_strlen (q);
>> }
>>
>> I don't think it is correct to say that strlen must be 0. In this testcase
>> the pointer passed to strlen is still S *, though I think with enough
>> tweaking you could also have something where the argument is &p->a.
>
> I think the problem here is changing the type of p->a. I'm
> not up on the latest C++ changes here but I think it's a known
> problem with the specification. A similar (known) problem also
> comes in the case of dynamically allocated objects:
>
> char *p = (char*)operator new (2);
> char *p1 = new (p) char ('a');
> char *p2 = new (p) char ('\0');
> strlen (p1);
>
> Is the strlen(p) call valid when there's no string or array
> at p: there is a singlelton char object that just happens
> to be followed by another singleton char object. It's not
> an array of two elements. Each is [an array of] one char.
>
> This is a (specification) problem for sequence containers like
> vector where strictly speaking, it's not valid to iterate over
> them because of the array restriction.
>
>>
>> I have no problem for strlen to return 0 if it sees a toplevel object of
>> size 1, but note that if it is extern, it already might be a problem in some
>> cases:
>> struct T { char a; char a2[]; } b;
>> extern struct T c;
>> void foo (int *p) { p[0] = strlen (b); p[1] = strlen (c); }
>> If c's definition is struct T c = { ' ', "abcde" };
>> then the object doesn't have length of 1.
>
> I'm assuming above you meant strlen(&b) and strlen(&c) (or
> equivalently, strlen(&b.a) and strlen(&c.a). If so, it's
> the same problem. The strlen call is invalid unless b.a and
> c.a are nul.
>
> Martin