[PATCH] Fix wrong code with truncated string literals (PR 86711/86714)

Martin Sebor msebor@gmail.com
Tue Jul 31 04:01:00 GMT 2018


On 07/30/2018 02:21 PM, Bernd Edlinger wrote:
> On 07/30/18 21:52, Martin Sebor wrote:
>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote:
>>> On 07/30/18 01:05, Martin Sebor wrote:
>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote:
>>>>> Hi!
>>>>>
>>>>> This fixes two wrong code bugs where string_constant
>>>>> returns over length string constants.  Initializers
>>>>> like that are rejected in C++, but valid in C.
>>>>
>>>> If by valid you are referring to declarations like the one in
>>>> the added test:
>>>>
>>>>      const char a[2][3] = { "1234", "xyz" };
>>>>
>>>> then (as I explained), the excess elements in "1234" make
>>>> the char[3] initialization and thus the test case undefined.
>>>> I have resolved bug 86711 as invalid on those grounds.
>>>>
>>>> Bug 86711 has a valid test case that needs to be fixed, along
>>>> with bug 86688 that I raised for the same underlying problem:
>>>> considering the excess nul as part of the string.  As has been
>>>> discussed in a separate bug, rather than working around
>>>> the excessively long strings in the middle-end, it would be
>>>> preferable to avoid creating them to begin with.
>>>>
>>>> I'm already working on a fix for bug 86688, in part because
>>>> I introduced the code change and also because I'm making other
>>>> changes in this area -- bug 86552.  Both of these in response
>>>> to your comments.
>>>>
>>>
>>> Sorry, I must admit, I have completely lost track on how many things
>>> you are trying to work in parallel.
>>>
>>> Nevertheless I started to review you pr86552 patch here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html
>>>
>>> But so far you did not respond to me.
>>>
>>> Well actually I doubt your patch does apply to trunk,
>>> maybe you start to re-base that one, and post it again
>>> instead?
>>
>> I read your comments and have been busy working on enhancing
>> the patch (among other things).  There are a large number of
>> additional contexts where constant strings are expected and
>> where a missing nul needs to be detected.  Some include
>> additional instances of strlen calls that my initial patch
>> didn't handle, many more others that involve other string
>> functions.  I have posted an updated patch that applies
>> cleanly and that handles the first set.
>>
>> There is also a class of problems involving constant character
>> arrays initialized by a braced list, as in char [] = { x, y, z };
>> Those are currently not recognized as strings even if they are
>> nul-terminated, but they are far more likely to be a source of
>> these problems than string literals, certainly in C++ where
>> string initializers must fit in the array.  I am testing a patch
>> to convert those into STRING_CST so they can be handled as well.
>>
>> Since initializing arrays with more elements than fit is
>> undefined in C and since the behavior is undefined at compile
>> time it seems to me that rejecting such initializers with
>> a hard error (as opposed to a warning) would be appropriate
>> and obviate having to deal with them in the middle-end.
>>
>
> We do not want to change what is currently accepted by the
> front end. period.

On whose behalf are you making such categorical statements?
It was Jakub and Richard's suggestion in bug 86714 to reject
the undefined excessive initializers and I happen to like
the idea.  I don't recall anyone making a decision about what
"we" do or don't want to change.

That said, if rejecting such initializers is not acceptable
an alternate solution that I believe Richard preferred is to
trim excess elements early on, e.g., during gimplification
(or at some other point after the front-end is done).  That's
okay with me too.

>
> But there is no reason why ambiguous string constants
> have to be passed to the middle end.
>
> For instance char c[2] = "a\0"; should look like c[1] = "a";
> while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c
> will cut the excess precision off anyway.
>
> That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0";
>
> I propose to have all STRING_CST always be created by the
> FE with explicit nul termination, but the
> TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated)
> TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated,
> truncated in the initializer.
>
> Do you understand what I mean?

I don't insist on any particular internal representation as long
as it makes it possible to detect and diagnose common bugs, and
(ideally) also help mitigate their worst consequences.

Martin



More information about the Gcc-patches mailing list