This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GCC/GLIBC and non-temporal instructions
- From: Sergey Oboguev <oboguev at yahoo dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 3 Jan 2012 10:38:01 -0800 (PST)
- Subject: GCC/GLIBC and non-temporal instructions
I have multi-threaded C++ application that relies on fine-grain parallelism and
makes extensive use of interlocked instructions and memory barriers for
inter-thread synchronization and communication.
I currently use LFENCE/SFENCE/MFENCE instructions for memory barriers (on
processors that have these instructions, otherwise resorting to LOCK-OR).
I was looking to relax barriers from LFENCE/SFENCE/MFENCE to
LOCK-OR/no-op/LOCK-OR correspondingly -- assuming I am able to verify that no
non-temporal SSE/3DNow instructions are used by the compiler-generated code or
runtime library without proper bracketing such instructions with memory fences.
The reasons for desired relaxation is that:
1) LOCK-OR is believed to take less cycles than xFENCE instructions (though I
have not actually benchmarked them yet, but this appears to be a common belief);
2) even more importantly, since memory barriers are most usually coupled with
interlocked instructions that access data entities used for inter-thread
synchronization anyway, it would be beneficial to eliminate extra memory barrier
in such places as redundant on x86/x64, because memory barrier is already
provided by interlocked instruction operating on primary datum, and therefore
extra xFENCE is redundant -- except only for the possibility of unbracketed
non-temporal instructions in CPU instruction stream.
The issue is significant since application can make thousands inter-thread
transactions per second.
The question is whether there is any GCC/runtime policy on non-temporal
SSE/3DNow instructions?
Specifically, can application expect that:
1) Compiler-generated code will not contain non-temporal instructions blocks not
bracketed by xFENCE on both sides?
2) Normal application code that may engage in inter-thread communication won't
be embedded inside such blocks?
3) Run-time library won't use non-temporal instructions blocks not bracketed by
xFENCE on both sides?
Or is it a completely gray area?