This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

From: Uros Bizjak <ubizjak at gmail dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, peter at cordes dot ca
Date: Fri, 17 Feb 2017 17:59:30 +0100
Subject: Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse
Authentication-results: sourceware.org; auth=none
References: <CAFULd4ZB8jehEJZBDmn10HGqQvOho9MJ9wDZVorRmbZMduJxDA@mail.gmail.com> <20170217163022.GK1849@tucnak>

On Fri, Feb 17, 2017 at 5:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sun, May 29, 2016 at 11:10:15PM +0200, Uros Bizjak wrote:
>> As explained in PR71245, comment #3 [1], it is better to use offset -4
>> to a %esp to implement a non-SSE memory fence instruction:
>>
>> -q-
>>
>> I guess it costs a code byte for a disp8 in the addressing mode, but
>> it avoids adding a lot of latency to a critical path involving a
>> spill/reload to (%esp), in functions where there is something at
>> (%esp).
>>
>> If it's an object larger than 4B, the lock orl could even cause a
>> store-forwarding stall when the object is reloaded.  (e.g. a double or
>> a vector).
>>
>> Ideally we could do the  lock orl  on some padding between two locals,
>> or on something in memory that wasn't going to be loaded soon, to
>> avoid touching more stack memory (which might be in the next page
>> down).  But we still want to do it on a cache line that's hot, so
>> going way up above our own stack frame isn't good either.
>
> Unfortunately this makes valgrind unhappy about that:
> https://bugzilla.redhat.com/show_bug.cgi?id=1423434
> I assume it will complain now on anything pre-SSE2 that contains the memory
> barrier in 32-bit code.
> Perhaps we should decrement and increment %esp around it or something
> similar (or push/pop)?  Of course, that would mean we need to take care
> of async unwind info.

Or, we can simply revert the patch? Not that the barrier performance
of non-SSE 32bit targets matter...

Uros.

Follow-Ups:
- Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse
  - From: Jakub Jelinek
- Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse
  - From: Uros Bizjak

References:
- Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]