[PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

Tue Aug 29 03:42:00 GMT 2017

On 08/24/2017 04:09 PM, Jeff Law wrote:
> On 08/22/2017 02:45 AM, Richard Biener wrote:
>> On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor <msebor@gmail.com> wrote:
>>> On 08/09/2017 10:14 AM, Jeff Law wrote:
>>>>
>>>> On 08/06/2017 05:08 PM, Martin Sebor wrote:
>>>>
>>>>>>
>>>>>> Well, simply because the way as implemented isn't a must-alias query
>>>>>> but maybe one that's good enough for warnings (reduces false positives
>>>>>> but surely doesn't eliminate them).
>>>>>
>>>>>
>>>>> I'm very interested in reducing the rate of false positives in
>>>>> these and all other warnings.  As I mentioned in my comments,
>>>>> I did my best to weed them out of the implementation by building
>>>>> GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
>>>>> a guarantee that there aren't any.  But the first implementation
>>>>> of any non-trivial feature is never perfect, and hardly any
>>>>> warning of sufficient complexity is free of false positives, no
>>>>> matter here it's implemented (the middle-end, front-end, or
>>>>> a standalone static analysis tool).  If you spotted some cases
>>>>> I had missed I'd certainly be grateful for examples.  Otherwise,
>>>>> they will undoubtedly be reported as more software is exposed
>>>>> to the warning and, if possible, fixed, as happens with all
>>>>> other warnings.
>>>>
>>>> I think Richi is saying that the must alias query you've built isn't
>>>> proper/correct.  It's less about false positives for Richi and more
>>>> about building a proper must alias query if I understand him correctly.
>>>>
>>>> I suspect he's also saying that you can't reasonably build must-alias on
>>>> top of a may-alias query framework.  They're pretty different queries.
>>>>
>>>> If you need something that is close to, but not quite a must alias
>>>> query, then you're going to have to make a argument for that and you
>>>> can't call it a must alias query.
>>>
>>>
>>> Attached is an updated and simplified patch that avoids making
>>> changes to any of the may-alias functions.  It turns out that
>>> all the information the logic needs to determine the overlap
>>> is already in the ao_ref structures populated by
>>> ao_ref_init_from_ptr_and_size.  The only changes to the pass
>>> are the enhancement to ao_ref_init_from_ptr_and_size to make
>>> use of range info and the addition of the two new functions
>>> used by the -Wrestrict clients outside the pass.
>>
>> Warning for memcpy (p, p, ...) is going to fire false positives all around
>> given the C++ FE emits those in some cases and optimization can
>> expose that we are dealing with self-assignments.  And *p = *p is
>> valid.
> Correct.  I wound my way through this mess a while back.  Essentially
> Red Hat had a customer with code that had overlapped memcpy arguments.
> We had them use the memstomp interposition library to start tracking
> these problems down.
>
> One of the things that popped up was structure/class copies which were
> implemented via calls to memcpy.    In the case of self assignment, the
> interposition library would note the overlap and (rightly IMHO) complain.

Is this bug 32667?  I'm not having any luck reproducing it with
any of the test cases there and varying struct sizes, or with
the test case in the duplicate bug 65029 I filed for the same
thing last year.  It would be nice to have a test case.

> One could argue that GCC should emit memmove by default for structure
> assignments, only using memcpy when it knows its not doing self
> assignment (which may be hard to determine).  Furthermore, GCC should
> eliminate self structure/class assignment.

If it's still a problem emitting memmove seems like the right
thing to do.  From what I've read the performance advantage of
memcpy over memmove seems debatable at best.  Most performance
sensitive code avoids making copies of very large objects so
the only code that can be impacted doesn't care about efficiency
quite so much.  For small enough objects, inlining the copy as
GCC already does would obviate the efficiency concern altogether.

>
>
>>
>> @@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
>>             }
>>         }
>>
>> +      /* Avoid folding the call if overlap is detected.  */
>> +      if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
>> +       return false;
>> +
>>
>> no, please not.  You diagnosed the call (which might be a false positive)
>> so why keep it undefined?  The folded stmt will either have the same
>> semantics (aggregate copies expanded as memcpy) or have all reads
>> performed before writes.
> So can we distinguish here between overlap and the self-copy case?

Yes, but only in a limited subset of cases.

>
> A self-copy should just be folded away.  It's no different than x = x on
> scalars except that we've dropped it to a memcpy in the IL.  Doing so
> makes the code more efficient and removes false positives from tools
> like the memstomp interposition library, making those tools more useful.

It's possible to do in simple cases but not in general.  I agree
that in the general case when overlap is possible the only safe
solution, short of actually testing for it at runtime, is to call
memmove.

Martin