This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [C11-atomic] [patch] gimple atomic statements
On Wed, Apr 4, 2012 at 5:50 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/04/2012 10:33 AM, Richard Guenther wrote:
>>
>> On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacleod@redhat.com>
>> ?wrote:
>> This is a WIP... that fntype fields is there for simplicity.. ? and no...
>> you can do a 1 byte atomic operation on a full word object if you want by
>>
>> Oh, so you rather need a size or a mode specified, not a "fntype"?
>
>
> yes, poorly named perhaps as I created things... its just a type node at the
> moment that indicates the size being operated on that I collected from the
> builtin in function.
Ok. Remember that you should use non-tree things if you can in GIMPLE
land. This probably means that both the size and the memmodel "operands"
should be
+ struct GTY(()) gimple_statement_atomic
+ {
+ /* [ WORD 1-8 ] */
+ struct gimple_statement_with_memory_ops_base membase;
+
+ /* [ WORD 9 ] */
+ enum gimple_atomic_kind kind;
+
enum gimple_atomic_memmodel memmodel;
unsigned size;
and not be trees in the ops array. Even more, both kind and memmodel
should probably go in membase.base.subcode
>> In the example you only ever use address operands (not memory operands)
>> to the GIMPLE_ATOMIC - is that true in all cases? ?Is the result always
>> non-memory?
>
> The atomic address can be any arbitrary memory location... I haven't gotten
> to that yet. ?its commonly just an address so I'm working with that first as
> proof of concept. When it gets something else it'll trap and I'll know :-)
Ok. Please try to avoid memory operands and stick to address operands ;)
You can make tree-ssa-operands.c not consider ADDR_EXPRs in the
address operands, similar to the ADDR_EXPRs in MEM_REF operand zero.
> Results are always non-memory, other than the side effects of the atomic
> contents changing and having to up date the second parameter to the
> compare_exchange routine. ?The generic routines for arbitary structures (not
> added in yet), actually just work with blocks of memory, but they are all
> handled by addresses and the functions themselves are typically void. ?I was
> planning on folding them right into the existing atomic_kinds as well... I
> can recognize from the type that it wont map to a integral type. ?I needed
> separate builtins in 4.7 ?for them since the parameter list was different.
>
>> I suppose the GIMPLE_ATOMICs are still optimization barriers for all
>> memory, not just that possibly referenced by them?
>
>
> yes, depending on the memory model used. ?It can force synchronization with
> other CPUs/threads which will have the appearence of changing any shared
> memory location. ?Various guarantees are made about whether those changes
> are visible to this thread after an atomic operation so we can't reuse
> shared values in those cases. ?Various guarantees are made about what
> changes this thread has made are visible to other CPUs/threads at an atomic
> call as well, so that precludes moving stores downward in some models.
>
>>
>>> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not
>>> used,
>>> and generate the really efficient compare and swap pattern which only
>>> produces a boolean result.
>>
>> I suppose gimple stmt folding could transform it as well?
>
> it could if I provided gimple statements for the 3 different forms of C&S. I
> was planning to just leave it this way since its the interface being forced
> by C++11 as well as C11... and then just emit the appropriate RTL for this
> one C&S type. ?The RTL patterns are already defined for the 2 easy cases for
> the __sync routines. the third one was added for __atomic. ?Its possible
> that the process of integrating the __sync routines with GIMPLE_ATOMIC will
> indicate its better to add those forms as atomic_kinds and then
> gimple_fold_stmt could take care of it as well. ? Maybe that is just a good
> idea anyway... ?I'll keep it in mind.
>
>
>>
>>> ? if only cmpxchg.2_4 were used, we can generate
>>> the C&S pattern which only returns the result. ?Only if we see both are
>>> actually used do we have to fall back to the much uglier pattern we have
>>> that produces both results. ?Currently we always generate this pattern.
>>>
>>> Next, we have the C11 atomic type qualifier which needs to be
>>> implemented.
>>> ?Every reference to this variable is going to have to be expanded into
>>> one
>>> or more atomic operations of some sort. ?Yes, I probably could do that by
>>> emitting built-in functions, but they are a bit more unwieldy, its far
>>> simpler to just create gimple_statements.
>>
>> As I understand you first generate builtins anyway and then lower them?
>> Or are you planning on emitting those for GENERIC as well? ?Remember
>> GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
>> Or do you plan to make __atomic integral part of GENERIC and thus
>> do this lowering during gimplification?
>
> I was actually thinking about doing it during gimplification... I hadnt
> gotten as far as figuring out what to do with the functions from the front
> end yet. ?I dont know that code well, but I was in fact hoping there was a
> way to 'recognize' the function names easily and avoid built in functions
> completely...
Heh ... you'd still need a GENERIC representation then. Possibly
a ATOMIC_OP tree may do.
> The C parser is going to have to understand the set of C11 routine names for
> all these anyway.. I figured there was something in there that could be
> done.
>
>
>
>>> I also hope that when done, I can also remove all the ugly built-in
>>> overload
>>> code that was created for __sync and continues to be used by __atomic.
>>
>> But the builtins will stay for our users consumption and libstdc++ use,
>> no?
>
>
> well, the names must remain exposed and recognizable since they are 'out
> there'. ?Maybe under the covers I can just leave them as normal calls and
> then during gimplification simply recognize the names and generate
> GIMPLE_ATOMIC statements directly from the CALL_EXPR. ?That would be ideal.
> ?That way there are no builtins any more.
I suppose my question was whether the frontends need to do sth about the
__atomic keyword or if that is simply translated to some type flag - or
is that keyword applying to operations, not to objects or types?
>
>
>>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
>>> easier to work with.
>>
>> Yes, I see that it is easier to work with for you. ?All other statements
>> will
>> see GIMPLE_ATOMICs as blockers for their work though, even if they
>> already deal with calls just fine - that's why I most of the time suggest
>> to use builtins (or internal fns) to do things (I bet you forgot to update
>> enough predicates ...). ?Can GIMPLE_ATOMICs throw with
>> -fnon-call-exceptions?
>> I suppose yes. ?One thing you missed at least ;)
>
>
> Not that I am aware of, they are 'noexcept'. ?But I'm sure I've missed more
> than a few things so far. ?Im pretty early in the process :-)
What about compare-exchange on a pointer dereference? The pointer
dereference surely can trap, so it can throw with -fnon-call-exceptions. No?
>>
>>> ?It cleans up both initial creation and rtl generation,
>>> as well as being easier to manipulate. It also encompasses an entire
>>> class
>>> of operations that are becoming more integral *if* we can make them
>>> efficient, and I hope to actually do some optimizations on them
>>> eventually.
>>> ?I had a discussion last fall with Linus about what we needed to be able
>>> to
>>> do to them in order for the kernel to use __atomic instead of their
>>> home-rolled solutions. ?Could I do everything with builtins? sure... its
>>> just more awkward and this approach seems cleaner to me.
>>
>> Cleaner if you look at it in isolation - messy if you consider that not
>> only
>> things working with atomics need to (not) deal with these new stmt kind.
>
>
> They can affect shared memory in some ways like a call, but don't have many
> of the other attributes of call. ?They are really more like an assignment or
> other operation with arbitrary shared memory side effects. ?I do hope to be
> able to teach the optimizers the directionality of the memory model
> restrictions. ?ie, ACQUIRE is only a barrier to hoisting shared memory
> code... ?stores can be moved downward past this mode. RELEASE is only a
> barrier to sinking code. ? RELAXED is no barrier at all to code motion. ?In
> fact, a relaxed store is barely different than a real store... but there is
> a slight difference so we can't make it a normal store :-P.
>
> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
> hopefully lessen their impact eventually.
Ok. I suppose having a GIMPLE_ATOMIC is fine then.
+ /* In tree-atomic.c. */
+ extern bool expand_gimple_atomic_load (gimple);
err, gimple-atomic.c please ;)
+ /* Return the expression field of atomic operation GS. */
+
+ static inline tree
+ gimple_atomic_expr (const_gimple gs)
+ {
+ GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+ gcc_assert (gimple_atomic_has_expr (gs));
+ return gimple_op (gs, 2);
+ }
err - what's "expression" in this context? I hope it's not an arbitrary
tcc_expression tree?!
+ static inline bool
+ gimple_atomic_has_fail_order (const_gimple gs)
+ {
+ return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
+ }
btw, these kind of predicates look superfluous to me - if they are true
exactly for one atomic kind then users should use the predicate to
test for that specific atomic kind, not for some random field presence.
+ /* Return the arithmetic operation tree code for atomic operation GS. */
+
+ static inline enum tree_code
+ gimple_atomic_op_code (const_gimple gs)
+ {
+ GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+ gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
+ gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
+ return (enum tree_code) gs->gsbase.subcode;
+ }
now, what was it with this "expression" thing again? Btw, ||s go to the
next line. subcode should be the atomic kind - it seems that you glob
too much into GIMPLE_ATOMIC and that you maybe should sub-class
GIMPLE_ATOMIC properly via the subcode field. You'd have an
gimple_atomic_base which fences could use for example.
All asserts in inline functions should be gcc_checking_asserts.
+ gimple
+ gimple_build_atomic_load (tree type, tree target, tree order)
+ {
+ gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+ gimple_atomic_set_kind (s, GIMPLE_ATOMIC_LOAD);
+ gimple_atomic_set_order (s, order);
+ gimple_atomic_set_target (s, target);
+ gimple_atomic_set_type (s, type);
+ gimple_set_has_volatile_ops (s, true);
you should have a gimple_build_atomic_raw function that takes all
operands and the atomic kind, avoiding the need for all the repeated
calls of gimple_atomic_set_* as well as avoid all the repeated checking
this causes.
+ else if (is_gimple_atomic (stmt))
+ {
+ tree t;
+ if (visit_store)
+ {
+ for (i = 0; i < gimple_atomic_num_lhs (stmt); i++)
+ {
+ t = gimple_atomic_lhs (stmt, i);
+ if (t)
+ {
+ t = get_base_loadstore (t);
+ if (t)
+ ret |= visit_store (stmt, t, data);
+ }
I thought results are always registers? The walk_stmt_load_store_addr_ops
looks wrong to me. As the loads/stores are implicit (you only have addresses)
you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
atomics specially as they expect to come along all loads/stores that way.
Those also handle calls and asms (for "memory" clobber) specially.
+ case GIMPLE_ATOMIC:
+ /* Atomic operations are memory barriers in both directions for now. */
+ add_virtual_operand (stmt, opf_def | opf_use);
It's surely not relevant that "in both directions for now" but instead that
"Atomic operations have side-effects on memory."
+ for (n = 0; n < gimple_atomic_num_lhs (stmt); n++)
+ get_expr_operands (stmt, gimple_atomic_lhs_ptr (stmt, n), opf_def);
+ for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+ get_expr_operands (stmt, gimple_op_ptr (stmt, n), opf_use);
+ break;
Do address-takens in operands make the addresses escape? If not
you should pass opf_non_addressable as well.
Index: tree-ssa-alias.c
===================================================================
*** tree-ssa-alias.c (revision 186098)
--- tree-ssa-alias.c (working copy)
*************** ref_maybe_used_by_stmt_p (gimple stmt, t
*** 1440,1445 ****
--- 1440,1447 ----
}
else if (is_gimple_call (stmt))
return ref_maybe_used_by_call_p (stmt, ref);
+ else if (is_gimple_atomic (stmt))
+ return true;
please add a comment before these atomic handlings that we assume
they are using/clobbering all refs because they are considered memory
optimization barriers. Btw, does this apply to non-address-taken automatic
references? I suppose not. Consider:
int foo()
{
struct s;
atomic_fence();
s.a = 1;
atomic_fence();
return s.a;
}
we still should be able to optimize this to return 1, no? At least SRA will
happily do similar things in a non-flow-sensitive way. Please add a FIXME
to the alias predicates at least, or even better fix this missed optimization.
There is no need to establish "backwards missed-optimization compatibility"
just like we do for asms.
*************** stmt_kills_ref_p_1 (gimple stmt, ao_ref
*** 1814,1819 ****
--- 1818,1825 ----
}
}
+ if (is_gimple_atomic (stmt))
+ return true;
that's for sure wrong ;) It should return false.
Index: tree-ssa-sink.c
===================================================================
*** tree-ssa-sink.c (revision 186098)
--- tree-ssa-sink.c (working copy)
*************** is_hidden_global_store (gimple stmt)
*** 145,150 ****
--- 145,154 ----
{
tree lhs;
+ /* Don't optimize across an atomic operation. */
+ if (is_gimple_atomic (stmt))
+ return true;
+
that's bogus, too (really all uses of is_hidden_global_store should go away).
Please look into the few callers of this function and handle atomics in
a correct way explicitely.
+ else if (is_gimple_atomic (stmt))
+ {
+ unsigned n;
+
+ /* We may be able to lessen this with more relaxed memory
+ models, but for now, its a full barrier. */
+ mark_all_reaching_defs_necessary (stmt);
+
+ for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+ {
+ tree t = gimple_op (stmt, n);
+ if (TREE_CODE (t) != SSA_NAME &&
+ TREE_CODE (t) != INTEGER_CST &&
+ !is_gimple_min_invariant (t) &&
+ !ref_may_be_aliased (t))
+ mark_aliased_reaching_defs_necessary (stmt, t);
+ }
+ }
for sure atomics do not make non-aliased automatic variable stores necessary.
At least I hope so. As there are only addresses in the atomic ops the
code looks wrong to me (and &&s go to the next line ...). As you are
marking everything needed anway you can just remove the bogus loop
completely.
+ case GIMPLE_ATOMIC:
+ /* Treat this like a call for now, it may expand into a call. */
+ if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
+ cost = gimple_num_ops (stmt) *
+ estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt)));
+ else
+ cost = 1;
+ break;
for sure this counts way too many "ops". There is at most a single
memory operand as far as I understand. A size/speed cost of 1 for
a FENCE is also too low.
I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.
Richard.
> Andrew
>