[PATCH] Fix wrong code with truncated string literals (PR 86711/86714)

Jeff Law law@redhat.com
Fri Aug 3 21:15:00 GMT 2018


On 07/30/2018 02:21 PM, Bernd Edlinger wrote:
> On 07/30/18 21:52, Martin Sebor wrote:
>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote:
>>> On 07/30/18 01:05, Martin Sebor wrote:
>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote:
>>>>> Hi!
>>>>>
>>>>> This fixes two wrong code bugs where string_constant
>>>>> returns over length string constants.  Initializers
>>>>> like that are rejected in C++, but valid in C.
>>>>
>>>> If by valid you are referring to declarations like the one in
>>>> the added test:
>>>>
>>>>      const char a[2][3] = { "1234", "xyz" };
>>>>
>>>> then (as I explained), the excess elements in "1234" make
>>>> the char[3] initialization and thus the test case undefined.
>>>> I have resolved bug 86711 as invalid on those grounds.
>>>>
>>>> Bug 86711 has a valid test case that needs to be fixed, along
>>>> with bug 86688 that I raised for the same underlying problem:
>>>> considering the excess nul as part of the string.  As has been
>>>> discussed in a separate bug, rather than working around
>>>> the excessively long strings in the middle-end, it would be
>>>> preferable to avoid creating them to begin with.
>>>>
>>>> I'm already working on a fix for bug 86688, in part because
>>>> I introduced the code change and also because I'm making other
>>>> changes in this area -- bug 86552.  Both of these in response
>>>> to your comments.
>>>>
>>>
>>> Sorry, I must admit, I have completely lost track on how many things
>>> you are trying to work in parallel.
>>>
>>> Nevertheless I started to review you pr86552 patch here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html
>>>
>>> But so far you did not respond to me.
>>>
>>> Well actually I doubt your patch does apply to trunk,
>>> maybe you start to re-base that one, and post it again
>>> instead?
>>
>> I read your comments and have been busy working on enhancing
>> the patch (among other things).  There are a large number of
>> additional contexts where constant strings are expected and
>> where a missing nul needs to be detected.  Some include
>> additional instances of strlen calls that my initial patch
>> didn't handle, many more others that involve other string
>> functions.  I have posted an updated patch that applies
>> cleanly and that handles the first set.
>>
>> There is also a class of problems involving constant character
>> arrays initialized by a braced list, as in char [] = { x, y, z };
>> Those are currently not recognized as strings even if they are
>> nul-terminated, but they are far more likely to be a source of
>> these problems than string literals, certainly in C++ where
>> string initializers must fit in the array.  I am testing a patch
>> to convert those into STRING_CST so they can be handled as well.
>>
>> Since initializing arrays with more elements than fit is
>> undefined in C and since the behavior is undefined at compile
>> time it seems to me that rejecting such initializers with
>> a hard error (as opposed to a warning) would be appropriate
>> and obviate having to deal with them in the middle-end.
>>
> 
> We do not want to change what is currently accepted by the
> front end. period.
?!?  I don't follow this.  When we can detect an error, we should issue
a suitable diagnostic.  As is mentioned later in the thread, some
diagnostics are considered "pedantic" and are conditional.  But I don't
see how you get to "We do not want to change what is currently accepted
by the front end. period."  That seems like far too strong of a statement.

I'm not sure I agree with this being a pedantic error only.  See below...
> 
> But there is no reason why ambiguous string constants
> have to be passed to the middle end.
Agreed -- if we're not outright rejecting, then we should present the
middle end with something reasonable.   But....

> 
> For instance char c[2] = "a\0"; should look like c[1] = "a";
> while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c
> will cut the excess precision off anyway.
I get the first case.  The second is less clear.  One could argue that
c[1] should be NUL or one could argue that c[1] should be 'a'.   It's
the inability to know what is "more correct" and the security
implications of leaving the string unterminated that drives my opinion
that this should not be a pedantic error, but instead a hard error.

> 
> That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0";
See above.  That's going to leave an unterminated string in the
resulting code.  That has a negative security impact.

> 
> I propose to have all STRING_CST always be created by the
> FE with explicit nul termination, but the
> TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated)
> TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated,
> truncated in the initializer.
> 
> Do you understand what I mean?
I think you're starting from the wrong basis, namely that we shouldn't
be issuing a hard error for excess initializers like this when we don't
have a clear cut best way to handle it.

jeff



More information about the Gcc-patches mailing list