This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)
- From: Jeff Law <law at redhat dot com>
- To: Bernd Edlinger <bernd dot edlinger at hotmail dot de>, Martin Sebor <msebor at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 14 Aug 2018 23:47:00 -0600
- Subject: Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)
- References: <DB6PR0701MB266438272BCFD277C6600399E4280@DB6PR0701MB2664.eurprd07.prod.outlook.com> <f6300126-1ebf-dd86-ed91-507b0770b495@gmail.com> <AM5PR0701MB2657173012DF86883B5B4E39E42F0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <f301b52c-df0d-0a10-7ce1-940139020b91@gmail.com> <AM5PR0701MB2657F4C71E034EE9EFDCB3D6E42F0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <1e3b7822-18dc-b36a-34c4-766fa8f7c87f@redhat.com> <AM5PR0701MB26572128FCA39F126E87C80CE4230@AM5PR0701MB2657.eurprd07.prod.outlook.com>
On 08/03/2018 03:38 PM, Bernd Edlinger wrote:
> On 08/03/18 23:15, Jeff Law wrote:
>> On 07/30/2018 02:21 PM, Bernd Edlinger wrote:
>>> On 07/30/18 21:52, Martin Sebor wrote:
>>>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote:
>>>>> On 07/30/18 01:05, Martin Sebor wrote:
>>>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This fixes two wrong code bugs where string_constant
>>>>>>> returns over length string constants. Initializers
>>>>>>> like that are rejected in C++, but valid in C.
>>>>>>
>>>>>> If by valid you are referring to declarations like the one in
>>>>>> the added test:
>>>>>>
>>>>>> const char a[2][3] = { "1234", "xyz" };
>>>>>>
>>>>>> then (as I explained), the excess elements in "1234" make
>>>>>> the char[3] initialization and thus the test case undefined.
>>>>>> I have resolved bug 86711 as invalid on those grounds.
>>>>>>
>>>>>> Bug 86711 has a valid test case that needs to be fixed, along
>>>>>> with bug 86688 that I raised for the same underlying problem:
>>>>>> considering the excess nul as part of the string. As has been
>>>>>> discussed in a separate bug, rather than working around
>>>>>> the excessively long strings in the middle-end, it would be
>>>>>> preferable to avoid creating them to begin with.
>>>>>>
>>>>>> I'm already working on a fix for bug 86688, in part because
>>>>>> I introduced the code change and also because I'm making other
>>>>>> changes in this area -- bug 86552. Both of these in response
>>>>>> to your comments.
>>>>>>
>>>>>
>>>>> Sorry, I must admit, I have completely lost track on how many things
>>>>> you are trying to work in parallel.
>>>>>
>>>>> Nevertheless I started to review you pr86552 patch here:
>>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html
>>>>>
>>>>> But so far you did not respond to me.
>>>>>
>>>>> Well actually I doubt your patch does apply to trunk,
>>>>> maybe you start to re-base that one, and post it again
>>>>> instead?
>>>>
>>>> I read your comments and have been busy working on enhancing
>>>> the patch (among other things). There are a large number of
>>>> additional contexts where constant strings are expected and
>>>> where a missing nul needs to be detected. Some include
>>>> additional instances of strlen calls that my initial patch
>>>> didn't handle, many more others that involve other string
>>>> functions. I have posted an updated patch that applies
>>>> cleanly and that handles the first set.
>>>>
>>>> There is also a class of problems involving constant character
>>>> arrays initialized by a braced list, as in char [] = { x, y, z };
>>>> Those are currently not recognized as strings even if they are
>>>> nul-terminated, but they are far more likely to be a source of
>>>> these problems than string literals, certainly in C++ where
>>>> string initializers must fit in the array. I am testing a patch
>>>> to convert those into STRING_CST so they can be handled as well.
>>>>
>>>> Since initializing arrays with more elements than fit is
>>>> undefined in C and since the behavior is undefined at compile
>>>> time it seems to me that rejecting such initializers with
>>>> a hard error (as opposed to a warning) would be appropriate
>>>> and obviate having to deal with them in the middle-end.
>>>>
>>>
>>> We do not want to change what is currently accepted by the
>>> front end. period.
>> ?!? I don't follow this. When we can detect an error, we should issue
>> a suitable diagnostic. As is mentioned later in the thread, some
>> diagnostics are considered "pedantic" and are conditional. But I don't
>> see how you get to "We do not want to change what is currently accepted
>> by the front end. period." That seems like far too strong of a statement.
>>
>
> What I mean here is: we should fix a middle-end consistency issue with
> the string constants. When that is done, we can decide to make
> change a pedantic error into a hard error, but IMHO it should be an independent
> patch of it's own.
Note Jakub has indicated why changing this to a hard error would be a
mistake and what the semantics ought to be in this scenario. so ignore
what I said about turning this into a hard error. I think we do
truncation per the defined semantics and present the middle end with
"canonicalized" literals.
>
> By the way, I found that Fortran has non-zero terminated strings with
> index range starting from 1.
> Then there are also overlength strings in Fortran, that have excess precision.
> Should we forbid that Fortran feature as well?
I have no clue.
>
> Then Ada and Go do not have zero-terminated strings, but I do not know
> if these can hit the strlen pass.
I wouldn't think so, but maybe in code that's meant to interoperate with
C. Or if they implemented a C looking strlen and GCC recognized it and
turned it into a strlen call. I don't think we do anything like that
now, but you never know when someone might add it.
jeff