This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: C provenance semantics proposal

From: "Uecker, Martin" <Martin dot Uecker at med dot uni-goettingen dot de>
To: "richard dot guenther at gmail dot com" <richard dot guenther at gmail dot com>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, "Peter dot Sewell at cl dot cam dot ac dot uk" <Peter dot Sewell at cl dot cam dot ac dot uk>, "law at redhat dot com" <law at redhat dot com>, "cl-c-memory-object-model at lists dot cam dot ac dot uk" <cl-c-memory-object-model at lists dot cam dot ac dot uk>
Date: Thu, 18 Apr 2019 13:24:54 +0000
Subject: Re: C provenance semantics proposal
References: <CAHWkzRTd-AsOxOckRaDAbyqWQf_tytFACubN9wpi=NG6=ha_jA@mail.gmail.com> <ddf469fd-685c-8f99-9164-bb62ec435685@redhat.com> <CAHWkzRTp8fFqXo7M5U5idHubxg3Q7rJ6GCqkG+o1-T8V8vCaYg@mail.gmail.com> <CAFiYyc0Tc4Et8ND73Zb14goRs95ZwuCE48wrGB=JXjSTTjgwcA@mail.gmail.com> <CAHWkzRTU_qoKe375UrOb9eej757XHGq4TkdF7vuCzFp=T4wqqg@mail.gmail.com> <CAFiYyc3Ri_U5Sqsv1gm6JhsOv=DYLB6LxtSLy7smP9sr-g+LWA@mail.gmail.com> <1555502021.4884.1.camel@med.uni-goettingen.de> <CAFiYyc0qeqcRgV7aFQSRwhief4_e3_wVC=b-xQfXTc-+YjG4yQ@mail.gmail.com> <1555505779.4884.4.camel@med.uni-goettingen.de> <CAFiYyc37AjBiLHqimgcwB5LD4hXN4YnZ+4Hzz2KS1ut-GfApCA@mail.gmail.com> <1555510321.4884.7.camel@med.uni-goettingen.de> <CAFiYyc0pUH4U0d2jcb3JYuKdoLgW-x5LjZ9AqHR6BQnxGDrtbw@mail.gmail.com> <CAFiYyc0KZfuhwtKKwcyVZuJ6w-metirM-VNYVs5D2_AqU6ZrHg@mail.gmail.com> <1555588638.12545.1.camel@med.uni-goettingen.de> <CAFiYyc15wqXeiCokWY7muWEmZ13VBnLZE2ET=dgK=3uxWD7_Yw@mail.gmail.com>

Am Donnerstag, den 18.04.2019, 14:30 +0200 schrieb Richard Biener:
> On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > > 

> > > > The additional issue that appears here though
> > > > is that we cannot even turn (int *)(uintptr_t)p
> > > > into p anymore since with the conditional
> > > > substitution we can then still arrive at
> > > > effectively (&y)[-1] = 1 which is of course
> > > > undefined behavior.
> > > > 
> > > > That is, your proposal makes
> > > > 
> > > >  ((int *)(uintptr_t)&y)[-1] = 1
> > > > 
> > > > well-defined (if &y - 1 == &x) but keeps
> > > > 
> > > >   (&y)[-1] = 1
> > > > 
> > > > as undefined which strikes me as a little bit
> > > > inconsistent.  If that's true it's IMHO worth
> > > > a defect report and second consideration.
> > 
> > This is true. But I would not call it inconsistent.
> > It is just unusual if you expect that casts to integers
> > and back are no-ops.  In this proposal a round-trip has
> > the effect of stripping the original provenance and
> > attaching a new one (which could be the same as the
> > old one).
> 
> Well, the standard explicitely says that if you convert
> a pointer to an integer (with the same or more precision)
> and back you get the same pointer back.  That suggests
> (int *)(uintptr_t)&y is a semantical no-op?

Not quite, it only guarantees that it compares equal
(7.20.1.4) which for pointers is (sadly) not the same.

But our proposal would make it work perfectly from a
programmer's point of view: The pointer you get back
can always be used instead of the original pointer.
But because it is not always clear whether this was
a pointer to a first element or a one-after pointer 
it has to work for both. For the compiler writer this
means that it is not the same pointer but a pointer
one know less about.

> > While in this specific scenario this might seem
> > unreasonable, there are other examples where you may
> > want to be able to get from one object to the others.
> > and using casts to integers would then be the
> > blessed way to express this.
> 
> Sure, no arguing about this.  Sofar this all has been in
> the hands of implementors to make uses of this idiom work,
> now users will be able to wield the standards sword :/

Well, isn't this the point of a standard? But we want
to get this right and this is why we are talking to you.

> > In my opinion, this is also intuitive:
> > By casting to an integer one then gets simple discrete
> > pointer semantics where one does not have provenance.
> > 
> > 
> > > Similarly that
> > > 
> > > int x;
> > > int y;
> > > uintptr_t pj = (uintptr_t)&y;
> > > 
> > > if (&x + 1 == &y) {
> > > 
> > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > > 
> > > is undefined but when I add a no-op
> > > 
> > >  (uintptr_t)&x;
> > > 
> > > it is well-defined is undesirable.  Can this no-op
> > > stmt appear in another function?  Or even in
> > > another translation unit (if x and y are global variables)?
> > > And does such stmt have to be present (in another
> > > TU) to make the example valid in this case?
> > 
> > Without that statement, the example is not valid as the
> > address of 'x' is not exposed. With the statement this
> > becomes valid and it does not matter where this statement
> > appears. Again, I agree that he fact that such a statement
> > has a side-effect is something one needs to get used to.
> > 
> > But adress-taken already has side-effect which could be
> > surprising, doesn't it? If I understood your answer
> > above correctly, for GCC you get this side-effect already
> > without the cast:
> > 
> > &x;
> 
> Well, yes.  But for GCC the important issue is whether
> this address-taking is still done after optimization
> (at the point we use provenance info to compute points-to sets).
> So this plain stmt wouldn't survive and would not make
> the example valid.  It's of course a lot harder to write this
> down into standard wording ;)  (if not impossible...)

"it has a side-effect whenever GCC does not optimize it away"
seems unlikely to get accepted in the standard ;-)

One could make a special rule about the statements with
unused results or add some language about "observability".

But couldn't the frontend simply mark the relevant casts?
(e.g. transform into __builtin_expose() or something)  

> I guess there as to be a data dependence between an address-taken
> operation and recreating that address (or a derived one to the same
> object).  That is, we're trying to support delta-compressing pointers
> as often used in shared memory data structures.
> 
> But as you've seen already conditional "dependences" are prone
> to break.

Yes, this is why we do not like it. Even assuming we could
make this sound, it would add a lot of complexity.

Limiting "provenance tracking"  to pointers where there
are a very limited amount of possible operations to begin
with and where we get 99% of the benefits makes a lot of
sense to me. But then every cast to an integer means we do
not track and the pointer has escaped.

> > For the statement to appear elsewhere, the address must
> > escape first. I would expect a compiler to treat a
> > cast to an integer identically to an escaped address.
> 
> Sure, (uintptr_t)&a also takes the address of a and passing
> that integer to a function makes the address of the object a
> escape.

My point is that even without the integer escaping, the
integer cast would imply it has escaped. But casts elsewhere
do not need to be considered, because this means the address
has escaped anyway.

> > > To me all this makes requiring exposal through a cast
> > > to a non-pointer (or accessing its representation) not
> > > in any way more "useful" for an optimizing compiler than
> > > modeling exposal through address-taking.
> > 
> > There would be a difference for cases like this:
> > 
> > int x[3];
> > int y;
> > 
> > x[0] = 1;
> > uintptr_t pj = (uintptr_t)&y;
> > 
> > if (pi + 4 == pj) {
> > 
> >   int* p = (int*)(pi + 4);
> >   p[-1] = 1;
> > }
> > 
> > Here 'x' is not exposed in our proposal so the assignment
> > via 'p' is invalid but the address is taken implicitly.
> 
> Via the x[0] - yes.  Unfortunate details of the C standard ;)
> 
> > Other examples is storage allocated via malloc/alloca
> > where there is always a pointer involved but which is
> > not automatically exposed in our proposal.
> 
> True, but the compiler nevertheless has to assume it is exposed
> once that pointer escapes the current function (or TU).  It's
> hard to make the validity decision at parsing time and at
> optimization time a stmt like
> 
>  (uintptr_t)ptr;
> 
> is gone very quickly.

Why not transform it into __builtin_expose in the frontend?

Best,
Martin

References:
- C provenance semantics proposal
  - From: Peter Sewell
- Re: C provenance semantics proposal
  - From: Jeff Law
- Re: C provenance semantics proposal
  - From: Peter Sewell
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Peter Sewell
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Uecker, Martin
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Uecker, Martin
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Uecker, Martin
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Richard Biener
- Re: C provenance semantics proposal
  - From: Uecker, Martin
- Re: C provenance semantics proposal
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]