This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse


Hello!

As explained in PR71245, comment #3 [1], it is better to use offset -4
to a %esp to implement a non-SSE memory fence instruction:

-q-

I guess it costs a code byte for a disp8 in the addressing mode, but
it avoids adding a lot of latency to a critical path involving a
spill/reload to (%esp), in functions where there is something at
(%esp).

If it's an object larger than 4B, the lock orl could even cause a
store-forwarding stall when the object is reloaded.  (e.g. a double or
a vector).

Ideally we could do the  lock orl  on some padding between two locals,
or on something in memory that wasn't going to be loaded soon, to
avoid touching more stack memory (which might be in the next page
down).  But we still want to do it on a cache line that's hot, so
going way up above our own stack frame isn't good either.

-/q-

Attached RFC patch implements this proposal.

2016-05-29  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)".

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Any other opinion on this issue? The linux kernel also implements
memory fence like the above proposal.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71245#c3

Uros.

Attachment: p.diff.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]