This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: -fno-tree-cselim not working?
- From: Andi Kleen <andi at firstfloor dot org>
- To: Ian Lance Taylor <iant at google dot com>
- Cc: Andi Kleen <andi at firstfloor dot org>, gcc at gcc dot gnu dot org
- Date: Sun, 28 Oct 2007 12:15:58 +0100
- Subject: Re: -fno-tree-cselim not working?
- References: <20071026070903.GS2896@sunsite.mff.cuni.cz.suse.lists.egcs> <20071026.002701.193707917.davem@davemloft.net.suse.lists.egcs> <20071026074825.GT2896@sunsite.mff.cuni.cz.suse.lists.egcs> <20071026.005802.107688735.davem@davemloft.net.suse.lists.egcs> <m3bqamj5x7.fsf@localhost.localdomain.suse.lists.egcs> <p73k5p9prwx.fsf@bingen.suse.de> <m38x5pfwwc.fsf@localhost.localdomain>
On Fri, Oct 26, 2007 at 01:23:15PM -0700, Ian Lance Taylor wrote:
> Andi Kleen <andi@firstfloor.org> writes:
>
> > Ian Lance Taylor <iant@google.com> writes:
> > >
> > > This code isn't going to be a problem, because spin_unlock presumably
> > > includes a memory barrier.
> >
> > At least in the Linux kernel and also in glibc for mutexes locks are just plain
> > function calls, which are not necessarily full memory barriers.
>
> True, and problematic in some cases--but a function call which gcc
> can't see is a memory barrier for all addressable memory.
I constructed a test case now to show why the optimization is a bad
idea in general. It just essentially measures how much it costs
to do the access on a cache cold variable. On a Core2 this is about
% gcc -o tstore tstore.c
% ./tstore
209 cycles
% gcc -O2 -o tstore tstore.c
% ./tstore
671 cycles
It runs about 3x faster without optimization (no if conversion of
variable++) than without because of the cache miss.
Your patch would fix it too because it uses a function call, but
it might not in the general case when the condition happens to be
not a function call.
-Andi
(x86 specific, but can be adapted to other architectures)
#include <stdio.h>
#include <string.h>
int GO_SLOW = 0;
#define LARGE (5*1020*1024)
int larger_than_cache[LARGE];
int variable;
static inline unsigned long long rdtsc(void)
{
unsigned a,d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
return a | ((unsigned long long)d) << 32;
}
void test(void)
{
unsigned long start, end;
start = rdtsc();
if (go_slow())
variable++;
end = rdtsc();
printf("%Lu cycles\n", end - start);
}
int go_slow(void)
{
return GO_SLOW;
}
int main(void)
{
variable++;
memset(&larger_than_cache, 0, sizeof larger_than_cache);
go_slow();
test();
return 0;
}