[RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

Uros Bizjak ubizjak@gmail.com
Sun Feb 19 18:29:00 GMT 2017


On Fri, Feb 17, 2017 at 5:59 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Feb 17, 2017 at 5:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Sun, May 29, 2016 at 11:10:15PM +0200, Uros Bizjak wrote:
>>> As explained in PR71245, comment #3 [1], it is better to use offset -4
>>> to a %esp to implement a non-SSE memory fence instruction:
>>>
>>> -q-
>>>
>>> I guess it costs a code byte for a disp8 in the addressing mode, but
>>> it avoids adding a lot of latency to a critical path involving a
>>> spill/reload to (%esp), in functions where there is something at
>>> (%esp).
>>>
>>> If it's an object larger than 4B, the lock orl could even cause a
>>> store-forwarding stall when the object is reloaded.  (e.g. a double or
>>> a vector).
>>>
>>> Ideally we could do the  lock orl  on some padding between two locals,
>>> or on something in memory that wasn't going to be loaded soon, to
>>> avoid touching more stack memory (which might be in the next page
>>> down).  But we still want to do it on a cache line that's hot, so
>>> going way up above our own stack frame isn't good either.
>>
>> Unfortunately this makes valgrind unhappy about that:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1423434
>> I assume it will complain now on anything pre-SSE2 that contains the memory
>> barrier in 32-bit code.
>> Perhaps we should decrement and increment %esp around it or something
>> similar (or push/pop)?  Of course, that would mean we need to take care
>> of async unwind info.
>
> Or, we can simply revert the patch? Not that the barrier performance
> of non-SSE 32bit targets matter...

Attached patch was committed to mainline to revert 2016-05-30 change.

2017-02-19  Uros Bizjak  <ubizjak@gmail.com>

    Revert:
    2016-05-30  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)".

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}

Uros.

--cut here--
Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md (revision 245574)
+++ config/i386/sync.md (working copy)
@@ -98,7 +98,7 @@
        (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))
    (clobber (reg:CC FLAGS_REG))]
   "!(TARGET_64BIT || TARGET_SSE2)"
-  "lock{%;} or{l}\t{$0, -4(%%esp)|DWORD PTR [esp-4], 0}"
+  "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}"
   [(set_attr "memory" "unknown")])

--cut here--



More information about the Gcc-patches mailing list