SUSv3's "memory location" and threads

Tue Aug 28 21:34:00 GMT 2007

Adam Olsen wrote:
>>From what I've seen (at least on x86), cache line size only affects
> performance, not semantics.  If two threads write to different parts
> of a cache line they get "false sharing".  The writes themselves are
> still only a small portion of each cache line.

True but its the contention issue which most people are interested in 
such research.  You may want 2 CPUS to work in close proximity (memory 
wise) but not so close that performance is affected as people care about 
performance more than semantics.  You can't change semantics you just 
have to work with them, where as performance is an never ending goal.

When one CPU is accessing memory the other CPUs will be snooping its 
memory bus to invalidate any writes to it from their upstream (closer 
proximity to CPU) caches.

When one CPU is then using atomic assembly instructions (for example 
byte-wise atomic exchange on IA32 to implement spinlocks) then that 
entire cache line is busied out while the instruction takes place.  This 
would affect memory access performance to that cache-line by any other 
CPU in the system.

It is my guess that this is the issues of any research to find out the 
rules of the game and the tradeoffs for a number of computing scenarios.

I'm not so sure on your term "false sharing", from the CPUs point of 
view nothing appears false.  Reads and writes to RAM beyond the last 
cache maybe bursted.  I you want a single byte, you get a cache-line 
anyway.  When you modify a byte the entire cache-line maybe written back 
to RAM (including the unchanged bytes around the single byte you changed).

This has the same kind of metrics as accessing a single byte from disk 
when the minimum transfer unit is a 512-byte sector.

In future one way to speed up RAM access maybe to make the RAM bus even 
wider so getting used to cache-lines now might be considered ground work 
in this direction.

> Changing this would have far reaching effects.  malloc for instance
> would have to internally align blocks on 64 byte boundaries (or
> whatever the local cache line size is).  In fact, the cache line size
> varies from cpu to cpu, or even within a cpu (L1 vs L2 vs L3).

Not really inside one hardware implementation, that makes no sense (but 
clearly a practical possibility with NUMA).  That is I believe L1, L2 
and L3 of the same physical system will have a common cache-line size. 
But if you buy another host with the same CPU type might have a 
different cache-line size.  There are not too many differing cache-line 
sizes to be concerned about and its really a case of do we perform well 
on the largest cache-line size we're going to execute on.

Newer CPUs with linux have details in /proc/cpuinfo, maybe because the 
CPUID instruction contains this information so its available to userspace.

> Incorporating the cache line size into the semantics would effectively
> create a new architecture (and the cache line size would have to be
> permanently fixed for that architecture, preventing future changes to
> the cache line size that may improve performance.)

There is architecture and there is architecture.  True the electronics 
need to be worked out, but as a programmer who case about that.  When 
you buy your next system it maybe different to your last.  As a 
programmer you just want to know what size it is to build in padding 
(reorder fields) at those spots where it is needed.

> So I do still think that, for the purposes of C at least, it is set in
> stone.  The most that might happen is for sizeof(void *) to be the
> official size and alignment, not sizeof(int).

Yes I agree (void *) over (int), since some 64bit capable systems still 
keep (long) as 32bits, but (void *) always gives away the host's natural 
width from which everything else is derived from.

Are there a GNUC directive you can place inside your struct to provide 
packing rules (fictitious example for i386) :

struct mystruct {
	char c;
	__align__(16);	/* A directive here to cause the next declared field to 
be aligned on this boundary */
	short s;
	__padding__(2);	/* simple way to insert bytes of padding */
	char last_byte;
	__align__(4);	/* forces padding to end to align sizeof(struct mystruct) */
};

sizeof(struct mystruct) = 24;
offsetof(&mystruct.c) = 0;
offsetof(&mystruct.s) = 16;
offsetof(&mystruct.last_byte) = 20;

Darryl