This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)


On Mon, 31 Jul 2017, Jeff Law wrote:
> > Please consider that expand_mem_thread_fence is used to place fences around
> > seq-cst atomic loads&stores when the backend doesn't provide a direct pattern.
> > With compiler barriers on both sides of the machine barrier, the generated
> > sequence for a seq-cst atomic load will be 7 insns:
> > 
> >   asm volatile ("":::"memory");
> >   machine_seq_cst_fence ();
> >   asm volatile ("":::"memory");
> >   dst = mem[src];
> >   asm volatile ("":::"memory");
> >   machine_seq_cst_fence ();
> >   asm volatile ("":::"memory");
> > 
> > I can easily imagine people looking at RTL dumps with this overkill fencing
> > being unhappy about this.
> But the extra fences aren't actually going to impact anything except
> perhaps an unmeasurable compile-time hit.  ie, it may look bad, but I'd
> have a hard time believing it matters in practice.

I agree it doesn't matter that much from compile-time/memory standpoint.
I meant it matters from the standpoint of a person working with the RTL dumps,
i.e. having to work through all that, especially if they need to work with
non-slim dumps.

> > I'd be more happy with detecting empty expansion via get_last_insn ().
> That seems like an unnecessary complication to me.

I think it's quite simple by GCC standards:

  {
    rtx_insn last = get_last_insn ();
    emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
    if (last == get_last_insn () && !is_mm_relaxed (model))
      expand_asm_memory_barrier ();
  }

> I'd prefer instead to just emit the necessary fencing in the generic code
> and update the MD docs so that maintainers know they don't have to emit
> the RTL fences themselves.

I agree the docs need an update, but hopefully not in this way.  The legacy
__sync_synchronize barrier has always been required to be a compiler barrier
when expanded to RTL, and it's quite natural to use the same RTL structure
for new atomic fences as the old memory_barrier expansion.  The only problem
in current practice seen so far is with empty expansion for new patterns.

Thanks.
Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]