[RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Thu Aug 22 18:50:00 GMT 2019

On 8/21/19 4:34 PM, Jeff Law wrote:
> On 8/16/19 4:21 PM, Martin Sebor wrote:
>> On 8/16/19 12:15 PM, Jeff Law wrote:
>>> On 8/16/19 12:09 PM, Marc Glisse wrote:
>>>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>>>
>>>>> This patch improves our ability to detect dead stores by handling cases
>>>>> where the size memcpy, memset, strncpy, etc call is not constant.Â  This
>>>>> addresses some, but not all, of the issues in 80576.
>>>>>
>>>>> The key here is when the size is not constant we can make conservative
>>>>> decisions that still give us a chance to analyze the code for dead
>>>>> stores.
>>>>>
>>>>> Remember that for dead store elimination, we're trying to prove that
>>>>> given two stores, the second store overwrites (partially or fully) the
>>>>> same memory locations as the first store.Â  That makes the first store
>>>>> either partially or fully dead.
>>>>>
>>>>> When we encounter the first store, we set up a bitmap of bytes written
>>>>> by that store (live_bytes).Â  We then look at subsequent stores and
>>>>> clear
>>>>> the appropriate entries in the bitmap.
>>>>>
>>>>> If the first store has a nonconstant length argument we can use the
>>>>> range of the length argument (max) and the size of the destination
>>>>> object to make a conservative estimation of how many bytes are written.
>>>>>
>>>>> For the second store the conservative thing to do for a non-constant
>>>>> length is to use the minimum of the range of the length argument.
>>>>
>>>> So I guess it won't handle things like
>>>>
>>>> void f(char*p,int n){
>>>>  Â Â  __builtin_memset(p,3,n);
>>>>  Â Â  __builtin_memset(p,7,n);
>>>> }
>>>>
>>>> where we know nothing about the length, except that it is the same? Or
>>>> do you look at symbolic ranges?
>>> Nope.Â  I think ao_ref can represent that, so it'd just be a matter of
>>> recording "n" as the length, then verifying that the second call's
>>> length is "n" as well.Â  That makes the first call dead.Â  We'd have to
>>> bypass the byte tracking in that case, but I think that's trivial
>>> because we already have a means to do that when the sizes are too large.
>>>
>>>>
>>>>> This doesn't come up a lot in practice.Â  But it also happens to put
>>>>> some
>>>>> of the infrastructure in place to handle strcpy and strcpy_chk which
>>>>> are
>>>>> needed to fully resolve 80576.
>>>>>
>>>>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>>>>> ppc32, aarch64, sparc, s390x and probably others.Â  Also verified that
>>>>> the tests work on the various *-elf targets in my tester.
>>>>>
>>>>> OK for the trunk?
>>>>
>>>> ENOPATCH
>>> Opps.Â Â Â  Attached.
>>
>> It's an improvement and I realize you said it doesn't handle
>> everything and that you don't think it comes up a lot, but...
>> I would actually expect the following example (from the bug)
>> not to be that uncommon:
>>
>>  Â  void g (char *s)
>>  Â  {
>>  Â Â Â  char a[8];
>>  Â Â Â  __builtin_strcpy (a, s);
>>  Â Â Â  __builtin_memset (a, 0, sizeof a);
>>  Â Â Â  f (a);
>>  Â  }
>>
>> or at least to be more common than the equivalent alternative
>> the improvement does optimize:
>>
>>  Â  void g (char *s)
>>  Â  {
>>  Â Â Â  char a[8];
>>  Â Â Â  __builtin_memcpy (a, s,Â  __builtin_strlen (s));
>>  Â Â Â  __builtin_memset (a, 0, 8);
>>  Â Â Â  f (a);
>>  Â  }
>>
>> It seems that making the first one work should be just a matter
>> of handling strcpy analogously (the string length doesn't matter).
>>
>> As an aside, the new tests make me realize that -Wstringop-overflow
>> should be enhanced to detect this problem (i.e., a consider
>> the largest size in a PHI).
> Certainly not seeing much of these in the code I've looked at.  It may
> be a case that aliasing gets in the way often.
> 
> Also note that we've got the same issues in this space that we did with
> the strlen optimization improvements for last year.  I totally spaced
> that and the net result may be we have to avoid using the type size for
> anything in DSE which may make it impossible to really do a good job.

The issue with the strlen optimization was that it relied on
the type of subobjects of multidimensional arrays and structs
to determine the maximum valid size of the access to them.
This ran afoul of assumptions in code that doesn't respect
subobject boundaries.

This is not a concern for outermost objects like in the example
above.  strlen and other parts of GCC still make use of their
size for optimization.  The middle-end diagnostics I've been
adding still expect code to respect subobject boundaries.  I've
been doing that for this very reason: to let us do a better job
not just optimizing code, but also diagnosing bugs: find more
real problems with fewer false positives.

Martin