This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)

From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
To: Jeff Law <law at redhat dot com>, Martin Sebor <msebor at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>, Jakub Jelinek <jakub at redhat dot com>
Date: Fri, 3 Aug 2018 21:38:03 +0000
Subject: Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)
References: <DB6PR0701MB266438272BCFD277C6600399E4280@DB6PR0701MB2664.eurprd07.prod.outlook.com> <f6300126-1ebf-dd86-ed91-507b0770b495@gmail.com> <AM5PR0701MB2657173012DF86883B5B4E39E42F0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <f301b52c-df0d-0a10-7ce1-940139020b91@gmail.com> <AM5PR0701MB2657F4C71E034EE9EFDCB3D6E42F0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <1e3b7822-18dc-b36a-34c4-766fa8f7c87f@redhat.com>

On 08/03/18 23:15, Jeff Law wrote:
> On 07/30/2018 02:21 PM, Bernd Edlinger wrote:
>> On 07/30/18 21:52, Martin Sebor wrote:
>>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote:
>>>> On 07/30/18 01:05, Martin Sebor wrote:
>>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote:
>>>>>> Hi!
>>>>>>
>>>>>> This fixes two wrong code bugs where string_constant
>>>>>> returns over length string constants.  Initializers
>>>>>> like that are rejected in C++, but valid in C.
>>>>>
>>>>> If by valid you are referring to declarations like the one in
>>>>> the added test:
>>>>>
>>>>>       const char a[2][3] = { "1234", "xyz" };
>>>>>
>>>>> then (as I explained), the excess elements in "1234" make
>>>>> the char[3] initialization and thus the test case undefined.
>>>>> I have resolved bug 86711 as invalid on those grounds.
>>>>>
>>>>> Bug 86711 has a valid test case that needs to be fixed, along
>>>>> with bug 86688 that I raised for the same underlying problem:
>>>>> considering the excess nul as part of the string.  As has been
>>>>> discussed in a separate bug, rather than working around
>>>>> the excessively long strings in the middle-end, it would be
>>>>> preferable to avoid creating them to begin with.
>>>>>
>>>>> I'm already working on a fix for bug 86688, in part because
>>>>> I introduced the code change and also because I'm making other
>>>>> changes in this area -- bug 86552.  Both of these in response
>>>>> to your comments.
>>>>>
>>>>
>>>> Sorry, I must admit, I have completely lost track on how many things
>>>> you are trying to work in parallel.
>>>>
>>>> Nevertheless I started to review you pr86552 patch here:
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html
>>>>
>>>> But so far you did not respond to me.
>>>>
>>>> Well actually I doubt your patch does apply to trunk,
>>>> maybe you start to re-base that one, and post it again
>>>> instead?
>>>
>>> I read your comments and have been busy working on enhancing
>>> the patch (among other things).  There are a large number of
>>> additional contexts where constant strings are expected and
>>> where a missing nul needs to be detected.  Some include
>>> additional instances of strlen calls that my initial patch
>>> didn't handle, many more others that involve other string
>>> functions.  I have posted an updated patch that applies
>>> cleanly and that handles the first set.
>>>
>>> There is also a class of problems involving constant character
>>> arrays initialized by a braced list, as in char [] = { x, y, z };
>>> Those are currently not recognized as strings even if they are
>>> nul-terminated, but they are far more likely to be a source of
>>> these problems than string literals, certainly in C++ where
>>> string initializers must fit in the array.  I am testing a patch
>>> to convert those into STRING_CST so they can be handled as well.
>>>
>>> Since initializing arrays with more elements than fit is
>>> undefined in C and since the behavior is undefined at compile
>>> time it seems to me that rejecting such initializers with
>>> a hard error (as opposed to a warning) would be appropriate
>>> and obviate having to deal with them in the middle-end.
>>>
>>
>> We do not want to change what is currently accepted by the
>> front end. period.
> ?!?  I don't follow this.  When we can detect an error, we should issue
> a suitable diagnostic.  As is mentioned later in the thread, some
> diagnostics are considered "pedantic" and are conditional.  But I don't
> see how you get to "We do not want to change what is currently accepted
> by the front end. period."  That seems like far too strong of a statement.
> 

What I mean here is: we should fix a middle-end consistency issue with
the string constants.  When that is done, we can decide to make
change a pedantic error into a hard error, but IMHO it should be an independent
patch of it's own.

By the way, I found that Fortran has non-zero terminated strings with
index range starting from 1.
Then there are also overlength strings in Fortran, that have excess precision.
Should we forbid that Fortran feature as well?

Then Ada and Go do not have zero-terminated strings, but I do not know
if these can hit the strlen pass.
C++ has a -fpermissive option that also allows overlength strings,
but all test cases are C only. etc. etc.

> I'm not sure I agree with this being a pedantic error only.  See below...
>>
>> But there is no reason why ambiguous string constants
>> have to be passed to the middle end.
> Agreed -- if we're not outright rejecting, then we should present the
> middle end with something reasonable.   But....
> 
>>
>> For instance char c[2] = "a\0"; should look like c[1] = "a";
>> while c[2] = "aaa"; should look like c[2] = "aa"; varasm.c
>> will cut the excess precision off anyway.
> I get the first case.  The second is less clear.  One could argue that
> c[1] should be NUL or one could argue that c[1] should be 'a'.   It's
> the inability to know what is "more correct" and the security
> implications of leaving the string unterminated that drives my opinion
> that this should not be a pedantic error, but instead a hard error.
> 

Yes, in the moment I just look at how C works, and try to make that
the normal.

However I think those details can be discussed.


>>
>> That is TREE_STRING_LENGTH (str) == 3 and TREE_STRING_POINTER(str) = "aa\0";
> See above.  That's going to leave an unterminated string in the
> resulting code.  That has a negative security impact.
> 
>>
>> I propose to have all STRING_CST always be created by the
>> FE with explicit nul termination, but the
>> TYPE_SIZE_UNIT (TREE_TYPE (str)) >= TREE_STRING_LENGTH(str) in normal case (null-terminated)
>> TREE_SIZE_UNIT (TREE_TYPE (str)) < TREE_STRING_LENGTH(str) if non zero terminated,
>> truncated in the initializer.
>>
>> Do you understand what I mean?
> I think you're starting from the wrong basis, namely that we shouldn't
> be issuing a hard error for excess initializers like this when we don't
> have a clear cut best way to handle it.
> 
> jeff
>

Follow-Ups:
- Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)
  - From: Jeff Law

References:
- Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]