This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Make strlen range computations more conservative
- From: Richard Biener <rguenther at suse dot de>
- To: Jeff Law <law at redhat dot com>,Bernd Edlinger <bernd dot edlinger at hotmail dot de>,GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Jakub Jelinek <jakub at redhat dot com>,Martin Sebor <msebor at gmail dot com>
- Date: Thu, 09 Aug 2018 08:27:09 +0200
- Subject: Re: [PATCH] Make strlen range computations more conservative
- References: <AM5PR0701MB2657A771EAD27A906CE47D0AE4550@AM5PR0701MB2657.eurprd07.prod.outlook.com> <firstname.lastname@example.org> <AM5PR0701MB26578AAB7E211795635E4D0EE4550@AM5PR0701MB2657.eurprd07.prod.outlook.com> <email@example.com>
On August 9, 2018 7:26:19 AM GMT+02:00, Jeff Law <firstname.lastname@example.org> wrote:
>On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
>> On 07/24/18 23:46, Jeff Law wrote:
>>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>>> This patch makes strlen range computations more conservative.
>>>> Firstly if there is a visible type cast from type A to B before
>>>> then value to strlen, don't expect the type layout of B to restrict
>>>> possible return value range of strlen.
>>> Why do you think this is the right thing to do? ie, is there
>>> in the standards that makes you think the code as it stands today is
>>> incorrect from a conformance standpoint? Is there a significant
>>> code that is affected in an adverse way by the current code? If so,
>>> what code?
>> I think if you have an object, of an effective type A say char,
>> you can cast the address of A to B, say typedef char (*B) for
>> and then to const char *, say for use in strlen. I may be wrong, but
>> that we should at least try not to pick up char from B, but
>> use A for strlen ranges, or leave this range open. Currently the
>> info for strlen is [0..1] in this case, even if we see the type cast
>> in the generic tree.
>Coming back to this... I'd like to hope we can depend on the type of
>strlen argument. Obviously if it's a char *, we know nothing. But if
>it's an ARRAY_TYPE it'd be advantageous if we could use the array
>to bound the length.
But the FE made it char * and only because pointer type conversions are useless we may see sth else. So we cannot use the type of the argument.
>*But* we have code which will turn pointer access into array indexing.
>tree-ssa-forwprop.c can do that, there may be others. So if we
>originally had a char * argument to strlen, but forwprop changed it
>a char[N] type, then have we just broken things? I'd totally forgotten
>about this behavior from forwprop. PR 46393 contains sample code where
If we really still have this it must be very constrained. Because we used to have sorts of wrong code issues with this and data dependence analysis.
>I also thought we had code to recover array indexing from pointer
>arithmetic in the C/C++ front-end, but I can't seem to find it tonight.
>But it would raise similar concerns.
I've removed that. What I did at some point is avoid decaying too much array to pointer conversions in the FEs to preserve array refs instead of trying to reconstruct them late.
>> One other example I have found in one of the test cases:
>> char c;
>> if (strlen(&c) != 0) abort();
>> this is now completely elided, but why? Is there a code base where
>> that is used? I doubt, but why do we care to eliminate something
>> stupid like that? If we would emit a warning for that I'm fine with
>> But if we silently remove code like that I don't think that it
>> will improve anything. So I ask, where is the code base which
>> gets an improvement from that optimization?
>I think it falls out naturally from trying to get accurate
>I don't think either Martin or I really care about optimizing strlen in
>this case. In fact it's so clearly erroneous that it ought to generate
>a diagnostic on its own. Knowing Martin it was probably included in
>tests for completeness.
>However, there is a fair amount of code that passes addresses of
>characters to functions that want char * string arguments and those
>functions promptly walk off the end of the single character,
>unterminated string. We actually just saw one of these in glibc that
>was detected by Martin's recent work. So it's definitely useful to
>track how these kinds of values get used.
>>> Ultimately we want highly accurate string lengths to help improve
>>> quality of the warnings we generate for potentially dangerous code.
>>> These changes seem to take us in the opposite direction.
>> No, I don't think so, we have full control on the direction, when
>> I do what Richi requested on his response, we will have one function
>> where the string length estimation is based upon, instead of several
>> open coded tree walks.
>I don't think anyone objects to consolidating length computation. What
>I think we're hashing through is how does the object model in GIMPLE
>affect the assumptions that can be correctly made about lengths of
>objects. When I ACK'd Martin's patches I'd totally forgotten about
>these issues in GIMPLE and the impact they'd have if they were used in
>areas that affect code generation. That is absolutely and totally my
>I suspect that we're ultimately going to have to refine the design a
>so that the lengths that impact code generation/optimization are
>distinct from those that are used for warnings. I'm not keen on this
>concept, but I believe it's better than just reverting all the work on
>issuing diagnostics for fishy code.
>We're going to be kicking this around immediately -- there's a concern
>that some of this may have gotten into the gcc-7 and/or gcc-8 codebase.
>We need to get a sense of scale of the damage as well as a sense of
>scale for how to go about fixing the codegen issues while still keeping
>the benefits in the warning code.
>If we go quiet, it's not from a lack of caring about this issue. Quite
>the opposite, we want to make sure we address these issues correctly
>without just churning on the trunk.