This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Make strlen range computations more conservative

From: Jeff Law <law at redhat dot com>
To: Bernd Edlinger <bernd dot edlinger at hotmail dot de>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: Richard Biener <rguenther at suse dot de>, Jakub Jelinek <jakub at redhat dot com>, Martin Sebor <msebor at gmail dot com>
Date: Wed, 8 Aug 2018 23:26:19 -0600
Subject: Re: [PATCH] Make strlen range computations more conservative
References: <AM5PR0701MB2657A771EAD27A906CE47D0AE4550@AM5PR0701MB2657.eurprd07.prod.outlook.com> <28fed157-7221-f517-4d2a-0d3f74b19e29@redhat.com> <AM5PR0701MB26578AAB7E211795635E4D0EE4550@AM5PR0701MB2657.eurprd07.prod.outlook.com>

On 07/24/2018 05:18 PM, Bernd Edlinger wrote:
> On 07/24/18 23:46, Jeff Law wrote:
>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote:
>>> Hi!
>>>
>>> This patch makes strlen range computations more conservative.
>>>
>>> Firstly if there is a visible type cast from type A to B before passing
>>> then value to strlen, don't expect the type layout of B to restrict the
>>> possible return value range of strlen.
>> Why do you think this is the right thing to do?  ie, is there language
>> in the standards that makes you think the code as it stands today is
>> incorrect from a conformance standpoint?  Is there a significant body of
>> code that is affected in an adverse way by the current code?  If so,
>> what code?
>>
>>
> 
> I think if you have an object, of an effective type A say char[100], then
> you can cast the address of A to B, say typedef char (*B)[2] for instance
> and then to const char *, say for use in strlen.  I may be wrong, but I think
> that we should at least try not to pick up char[2] from B, but instead
> use A for strlen ranges, or leave this range open.  Currently the range
> info for strlen is [0..1] in this case, even if we see the type cast
> in the generic tree.
Coming back to this... I'd like to hope we can depend on the type of the
strlen argument.  Obviously if it's a char *, we know nothing.  But if
it's an ARRAY_TYPE it'd be advantageous if we could use the array bounds
to bound the length.

*But* we have code which will turn pointer access into array indexing.
tree-ssa-forwprop.c can do that, there may be others.  So if we
originally had a char * argument to strlen, but forwprop changed it into
a char[N] type, then have we just broken things?  I'd totally forgotten
about this behavior from forwprop.  PR 46393 contains sample code where
this happens.

I also thought we had code to recover array indexing from pointer
arithmetic in the C/C++ front-end, but I can't seem to find it tonight.
But it would raise similar concerns.

> 
> One other example I have found in one of the test cases:
> 
> char c;
> 
> if (strlen(&c) != 0) abort();
> 
> this is now completely elided, but why?  Is there a code base where
> that is used?  I doubt, but why do we care to eliminate something
> stupid like that?  If we would emit a warning for that I'm fine with it,
> But if we silently remove code like that I don't think that it
> will improve anything.  So I ask, where is the code base which
> gets an improvement from that optimization?
I think it falls out naturally from trying to get accurate computations.
 I don't think either Martin or I really care about optimizing strlen in
this case.  In fact it's so clearly erroneous that it ought to generate
a diagnostic on its own.  Knowing Martin it was probably included in the
tests for completeness.

However, there is a fair amount of code that passes addresses of
characters to functions that want char * string arguments and those
functions promptly walk off the end of the single character,
unterminated string.  We actually just saw one of these in glibc that
was detected by Martin's recent work.  So it's definitely useful to
track how these kinds of values get used.

> 
>> Ultimately we want highly accurate string lengths to help improve the
>> quality of the warnings we generate for potentially dangerous code.
>> These changes seem to take us in the opposite direction.
>>
> 
> No, I don't think so, we have full control on the direction, when
> I do what Richi requested on his response, we will have one function
> where the string length estimation is based upon, instead of several
> open coded tree walks.
I don't think anyone objects to consolidating length computation.  What
I think we're hashing through is how does the object model in GIMPLE
affect the assumptions that can be correctly made about lengths of
objects.  When I ACK'd Martin's patches I'd totally forgotten about
these issues in GIMPLE and the impact they'd have if they were used in
areas that affect code generation.  That is absolutely and totally my
mistake.

I suspect that we're ultimately going to have to refine the design a bit
so that the lengths that impact code generation/optimization are
distinct from those that are used for warnings.  I'm not keen on this
concept, but I believe it's better than just reverting all the work on
issuing diagnostics for fishy code.

We're going to be kicking this around immediately -- there's a concern
that some of this may have gotten into the gcc-7 and/or gcc-8 codebase.
We need to get a sense of scale of the damage as well as a sense of
scale for how to go about fixing the codegen issues while still keeping
the benefits in the warning code.

If we go quiet, it's not from a lack of caring about this issue.  Quite
the opposite, we want to make sure we address these issues correctly
without just churning on the trunk.

Jeff

Follow-Ups:
- Re: [PATCH] Make strlen range computations more conservative
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]