[PATCH] Make strlen range computations more conservative

Fri Nov 16 17:26:00 GMT 2018

Just a reminder:

those are the two parts of this patch, which have been posted already
a while ago when we were still in stage 1:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00805.html

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01237.html

Bernd.

On 10/20/18 11:16 AM, Bernd Edlinger wrote:
> On 10/17/18 11:56 PM, Jeff Law wrote:
>> On 10/12/18 9:34 PM, Bernd Edlinger wrote:
>>> On 10/12/18 16:55, Jeff Law wrote:
>>>> On 9/15/18 2:43 AM, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this is an update on my strlen range patch (V7).  Again re-based and
>>>>> retested to current trunk.
>>>>>
>>>>> I am aware that Martin wants to re-factor the interface of get_range_strlen
>>>>> and have no objections against, but I'd suggest that to be a follow-up patch.
>>>>>
>>>>> I might suggest to rename one of the two get_range_strlen functions at the
>>>>> same time as it is rather confusing to have to count the parameters in order
>>>>> to tell which function is meant.
>>>>>
>>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>>> Is it OK for trunk?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>>
>>>>>
>>>>> changelog-range-strlen-v7.txt
>>>>>
>>>>> gcc:
>>>>> 2018-08-26  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>
>>>>>     * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New
>>>>>     helper function for strlen range estimations.
>>>>>     (get_range_strlen): Use looks_like_a_char_array_without_typecast_p
>>>>>     for warnings, but use GIMPLE semantics otherwise.
>>>>>     * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics.
>>>>>     (get_min_string_length): Avoid not NUL terminated string literals.
>>>> The introduction of looks_like_a_char_array_without_typecast_p is
>>>> probably a good thing.  Too  much code is already implemented inline
>>>> within get_range_strlen.
>>>>
>>>> It looks like you added handling of ARRAY_RANGE_REF.  I don't know how
>>>> often they come up in practice, but handling it seems like a reasonable
>>>> extension to what we're doing.  Bonus points if it's triggering with any
>>>> kind of consistency.
>>>>
>>>
>>> I did only want to be consistent with get_inner_reference here,
>>> but did not have encountered these, probably only an Ada thing?
>> Trying to be consistent with get_inner_reference is fine :-)  GCC
>> supports case ranges as an extension for C/C++.  No clue if they're
>> natively supported by Ada or any other langauge.
>>
>>
>>
>>>
>>>> I actually prefer Martin's unification of type/fuzzy into a single
>>>> enumeration to describe the desired behavior.  Doing it with two args
>>>> where some values are mutually exclusive is just asking for trouble.
>>>> Though I like that you called out the values that are mutually exclusive.
>>>>
>>>> I definitely want to look at how your patch and Martin's differ on the
>>>> handling of flexible array members -- clearly we must avoid setting a
>>>> range in that case.  I'm surprised this didn't trigger a failure in the
>>>> testsuite though.  Martin's work in this space did.
>>>>
>>>> The bugfix in get_min_string_length looks like it probably stands on its
>>>> own.
>>>>
>>>> I'm still evaluating the two approaches...
>>>>
>>>
>>> One thing I should mention is, that there is still one place where opportunistic
>>> range info influence conde gen.  I mean at least with my patch.
>> ACK.   That's soemthing Martin's patch does address.  AT least it's
>> supposed to.
> 
> Okay, based on my previous patch I can of course do the same.
> 
> See attached.  This was bootstrapped and reg-tested together with my
> previous patch.  The only "regression" was pr79376.c, which is xfailed
> because the test case is expecting the return value to be in the limits given
> by in the opportunistic range info.
> 
> While I think the strlen return optimization will be safe with this patch,
> I have however still a philosophical problem with it, because s[n]printf
> is a highly complex piece of software, and we take it away the right
> to return a failure code, when it has to because of an implementation bug.
> 
>>>
>>> That is the return value from sprintf is using the range info from the
>>> warning, and uses that to set the range info of the result.
>>> In try_substitute_return_value, which uses the range info that was
>>> from the warnings and feeds that into set_range_info.
>> Right.  In Martin's work we have enough range info to distinguish
>> between the range info for warnings and the true range info and only use
>> the latter in the call to set_range_info.
>>
>>
> 
> Well I have tried the test cases from Martins patch, and all except one
> work fine for me, and pass with my patch-set as well.
> 
> The problematic one is strlenopt-59.c (in his patch, my patch has picked
> the same name, unfortunately).
> 
> The difference is how object declarations are handled.  While my patch
> does not try to solve that problem at all, his patch does probably look
> at the declaration size to improve the strict limits.
> 
> I am not totally against it, but do not feel any need to implement that
> feature in the same patch together with a function interface change, and
> a code-correctness fix.
> 
>  From the test case it looks like the globals are comdat objects, because there
> is no initialization.  You can declare "char a3[3];" and "char a3[100];" in
> different translation units and it will be a3[100] at run-time.
> 
> For me the red line here is basically, that the strlen optimization should
> _not_ be more aggressive than the loop-niter optimization, thus the lackmus
> test is, would the test case pass if strlen is implemented as:
> 
> #define strlen(c) ({ __SIZE_TYPE__ _n; for(_n=0; (c)[_n]; _n++); _n; })
> 
> Well, it does not.  But that should probably considered as a goal.
> 
> 
> 
> Bernd.