[PATCH] Use type alignment in get_builtin_sync_mem

Mon Sep 9 08:37:00 GMT 2019

On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
>
> Richard Biener wrote:
> > On Tue, Sep 3, 2019 at 3:09 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > > > If you read the C standards fine-print then yes.  But people (or
> > > > even the C frontend!) hardly get that correct since for example
> > > > for
> > > >
> > > > struct __attribute__((packed)) { int i; } s;
> > > >
> > > > void foo ()
> > > > {
> > > >   __builtin_printf ("%p", &s.i);
> > > > }
> > > >
> > > > the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> > > > (and not an int * variant with 1-byte alignment).  And people use
> > > > int * to pass such pointers everywhere.
> > >
> > > Well ... I'd say if you cast to int * and then use an atomic operation,
> > > it's your own fault that it fails :-/   If the frontend itself uses
> > > the wrong type, that is of course a problem.
> >
> > Yes.  The C standard says that if you cast something to a pointer type
> > the pointer has to be aligned according to the pointed-to type, otherwise
> > undefined.  But we have no chance to use this because of this kind of
> > issues (and of course developer laziness...).
>
> But as far as I can see, for *atomic* operations at least, we do make
> that assumption.  The x86 back-end for example just assumes that any
> "int" or "long" object that is the target of an atomic operation is
> naturally aligned, or else the generated code would just crash.   So
> if you used your example with a packed struct and passed that pointer
> to an atomic, the back-end would still assume the alignment and the
> code would crash.  But I'd still consider this a perfectly reasonable
> and expected behavior in this case ...

Would it crash?  I think it would be not atomic if it crossed a cache-line
boundary.

> The only thing that is special on s390 is that for the 16-byte integer
> type, the "natural" (mode) alignment is only 8 bytes, but for atomics
> we require 16 bytes.  But if you explicitly use a 16-byte aligned
> pointer type to assert to the compiler that this object *is* aligned,
> the compiler should be able to rely on that.
>
> Of course if some part of the middle end get the alignment wrong, we
> have a problem.  But I'm not sure how this could happen here.  I agree
> that it might be the case that a user-specified *under*-alignment might
> get lost somewhere (e.g. you have a packed int and somewhere in the
> optimizers this gets accidentally cast to a normal int with the default
> alignment).  But in *this* case, what would have to happen is that the
> middle-end accidentally casts something to a pointer with *higher*
> than the default alignment for the type, even though no such pointer
> cast is present in the source.  Can this really happen?

If the cast to the lower-aligned type is lost and there is an earlier cast
to the aligned type.

So - I don't see how this cannot happen.  Will it be likely?  Probably
not.

> > I'm not sure how it is done now but IIRC the users
> > use __atomic_load (ptr) and then the frontend changes that
> > to one of BUILT_IN_ATOMIC_LOAD_{N,1,2,4,8,16} based on
> > some criteria (size mainly).  I'm saying we should factor in
> > alignment here, _not_ using say BUILT_IN_ATOMIC_LOAD_16
> > if according to the C standard the data isn't aligned.  Maybe we can
> > ask the target whether the alignment according to C matches the
> > requirement for _16 expansion.  And if things are not fine
> > the FE should instead use BUILT_IN_ATOMIC_LOAD_N
> > which we later if the compiler can prove bigger alignment and N
> > is constant could expand as one of the others.
> >
> > But safety first.
>
> The problem with using the _N variant is that we then get a call
> to the _n version of the library routine, right?

Yes.

> This would actually
> be wrong on s390.  The problem is that all atomic operations on any
> one single object need to be consistent: they either all use the
> 16-byte atomic instruction, or else they all protect the access with
> a lock.  If you have parts of the code use the lock and other parts
> use the instruction, they're not actually protected against each other.

But then the user has to be consistent in accessing the object
atomically.  If he accesses it once as (aligned_int128_t *)
and once as (int128_t *) it's his fault, no?

If we'd document that the user invokes undefined behavior
when performing __builtin_atomic () on objects that are not
sufficiently aligned according to target specific needs then
we are of course fine and should simply assume the memory
is aligned accordingly (similar to your patch but probably with
some target hook specifying the atomic alignment requirement
when it differs from mode alignment).  But I don't read the
documentation of our atomic builtins that way.

Does _Atomic __int128_t work properly on s390?

> This is why the _16 version of the library routine does the runtime
> alignment check, so that all accesses to actually 16-byte aligned
> objects use the instruction, both in the library and in compiler-
> generated code.   The _n version doesn't do that.
>
> So I guess we'd have to convert the _n version back to the _16
> library call if the size is constant 16, or something?

If it is aligned 16?  But not sure - see above.

> > So yes, I'd say try tackling the issue in the frontend which is the
> > last to know the alignment in the most optimistic way (according
> > to the C standard).  At RTL expansion time we have to be (very)
> > conservative.
> >
> > Btw, the alternative is to add an extra alignment parameter
> > to all of the atomic builtins where the FE could stick the alignment
> > for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
> > variants it sounds easier to use those...
> >
> > Note we only document __atomic_load and __atomic_load_n so
> > the _{1,2,4,8,16} variants seem to be implementation detail.
>
> Adding the alignment parameter would work, I think.

But then the user still needs to access the int128_t with
consistent alignment, no?  Otherwise it will again not protect
against each other?

Why not always use the slow & correct variant for __int128?

Richard.

> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   Ulrich.Weigand@de.ibm.com
>