[PATCH] doc: clarify the situation with pointer arithmetic

Uecker, Martin Martin.Uecker@med.uni-goettingen.de
Thu Jan 30 14:42:00 GMT 2020


Am Donnerstag, den 30.01.2020, 09:30 +0100 schrieb Richard Biener:
> On Wed, Jan 29, 2020 at 3:00 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:

...

> > > I guess I'd me much more happy if PVNI said that when
> > > an integer is converted to a pointer and the integer
> > > is value-equivalent to pointers { p1, p2, ... } then
> > > the provenance of the resulting pointer is
> > > that of p1 (or p2, ... which is semantically equivalent)
> > 
> > (if the provenance is the same)
> > 
> > > and when two pointers p1 and p2 are
> > > value-equivalent and their provenance is not the
> > > same then the behavior is undefined.
> > 
> > I see. Then here..
> > 
> > int a[3];
> > int b[3];
> > 
> > (uintptr_t)&b[0]; // b also exposed
> > int *p = (int*)(uintptr_t)&a[3];
> > 
> > ..the behavior is undefined because the
> > two pointers have identical addresses
> > but different provenance.
> > 
> > I agree, from a compiler writer's point-of-view
> > this would be a good solution. But to a programmer,
> > this would be quite difficult to explain.
> > The preference of the working group was that the casts
> > should just work in all cases and do what the
> > programmer intended, even if this prevents some
> > optimization. But I will see that this is
> > added to the list of options under consideration.
> > 
> > 
> > PVNI-ae-ud assigns the provenance of an
> > exposed object at the address. If there
> > are two possible objects (as in the example
> > above), the pointer could point to both but
> > then has to be used consistently only with
> > only one object. Essentially, we want the
> > pointer to have exactly one provenance but
> > we might delay the decision. The idea is
> > that a compiler might figure out the correct
> > provenance later, e.g. by observing accesses.
> 
> I thought about alternatives to PVNI and implementation
> consequences.  But all different kind of must-behave-like-this
> guarantees face serious implementation difficulties I think
> so the only alternative to PVNI (which I think is implementable
> but at a optimization opportunity cost) is one that makes
> two pointers with the same value always have the same
> provenance (and otherwise make the behavior undefined).

This would need to come with precise rules about
when the occurance of two such pointers is UB,
e.g. comparisons of such pointers, or that
two such pointers are cast to int in the same
execution.

The mere existance of such pointers should be
quite common and should not already be UB.

But I am uncomfortable with the idea that
comparison of pointers is always allowed except
for some special case which then is UB. This
might cause are and very difficult to find bugs.

> > It is possible to formulate
> > some conditions about when a pointer converted
> > from an integer could get assigned the
> > points-to-set of a value-equivalent pointer:
> > 
> > 1) using knowledge about object location in
> > memory: If there is no adjacent object which
> > was exposed, one can conclude that the
> > provenance is the object at this address.
> 
> Usually at the point compilers want to know objects
> are not laid out.  So what compilers do is simply
> say the user cannot possibly know so it can
> choose at will (even if later object layout disagrees).

The compiler is free to choose at will. But in
my opinion, it then has to stick with its choice.

Otherwise, this leads to really abstract and
confusing semantics. The wording of the standard
also implies that UB is based on actual behavior.

> > 2) based on offsets: If the pointer points
> > in the middle of an object, there is also
> > no ambiguity.
> 
> The difficulty here lies in the requirement of
> exact offset tracking which makes (some?)
> points-to implementations prohibitly expensive.
> But yes, sure.

Yes, but perhaps there are some low-hanging fruit
where it is easy to determine.

> > 3) a mix of both, to differentiate objects
> > before and after in memory.
> > 
> > 
> > >  That is,
> > > 
> > > int a, b;
> > > int  *p = &a + 1;
> > > int *q = &b;
> > > if (p == q)
> > >   ... undefined ...
> > 
> > We considered making the comparison undefined in the
> > specific situation where one of the pointer is one-after
> > pointer and the other a pointer to the beginning of a
> > different object. This would solve the problems with
> > conditional equivalences.
> 
> Note my proposal doesn't make the comparison undefined
> but the case where both are equivalent cannot be reached
> at runtime without invoking undefined behavior.  That means
> we can optimize the comparison based on provenance
> where p points to a and q points to b.

Sorry, I did not get this. What are the exact conditions
for UB?

> > Others proposed to make the result of the comparison
> > unspecified, but I think this does not help.
> 
> Indeed.  It's not unspecified, it's known to evaluate to false.
> I think there's existing wording in the standard that
> allows it to evaluate to true for pointers one-after-the-object,
> that would need to be changed of course.

The problem is that if the comparison if not optimized
and the pointers have the same address, then it would
evaluate to true at run-time. If I understand correctly,
you somehow want to make this case be UB, but I haven't
quite understood how (if it is not the comparison of such
pointers that invokes UB).


> > At the moment, the consensus is that pointer
> > comparison should be always allowed and the
> > result should only depend on the address. Again,
> > the idea is to make is simpler and more consistent
> > for the programmer. But yes, this makes it more
> > difficult for the compiler writer.
> 
> it's a conflict of interest on the user side as well - users
> expect DWIM semantics but at the same time want
> fastest possible code...

Yes. It is not just DWIM but also more easily to understand
semantics which should lead to less bugs. The general feeling
is that C moved a bit too much to the side of fastest
possible code. My personal preference would be to put
PVNI-ae-ud in the standard and have compiler options which
re-enable these - then unsafe from the standard' point-of-view -
optimization for those who need it.

Best,
Martin




More information about the Gcc-patches mailing list