SUSv3's "memory location" and threads

Tue Aug 28 23:40:00 GMT 2007

On 8/28/07, Darryl Miles <darryl-mailinglists@netbauds.net> wrote:
> Adam Olsen wrote:
> >>From what I've seen (at least on x86), cache line size only affects
> > performance, not semantics.  If two threads write to different parts
> > of a cache line they get "false sharing".  The writes themselves are
> > still only a small portion of each cache line.

> I'm not so sure on your term "false sharing", from the CPUs point of
> view nothing appears false.  Reads and writes to RAM beyond the last
> cache maybe bursted.  I you want a single byte, you get a cache-line
> anyway.  When you modify a byte the entire cache-line maybe written back
> to RAM (including the unchanged bytes around the single byte you changed).

"False sharing" is not my term, it seems to be used quite a bit to
this phenomena.

> > Changing this would have far reaching effects.  malloc for instance
> > would have to internally align blocks on 64 byte boundaries (or
> > whatever the local cache line size is).  In fact, the cache line size
> > varies from cpu to cpu, or even within a cpu (L1 vs L2 vs L3).
>
> Not really inside one hardware implementation, that makes no sense (but
> clearly a practical possibility with NUMA).  That is I believe L1, L2
> and L3 of the same physical system will have a common cache-line size.
> But if you buy another host with the same CPU type might have a
> different cache-line size.  There are not too many differing cache-line
> sizes to be concerned about and its really a case of do we perform well
> on the largest cache-line size we're going to execute on.

The pentium 4 has a 64 byte L1 cache line size and 128 byte L2 cache
line size, but the L2 cache lines are broken up into 64 byte
"sectors".  Itanium has 32 byte L1 cache lines and 64 byte L2 and L3
cache lines.  Itanium 2 has 64 byte L1 cache lines and 128 byte L2 and
L3 cache lines.

As an aside, NUMA will change malloc design anyway.  What fun that we
have not one but two new factors in allocator design!

Of course the point remains that small allocations (fraction of a
cache line) are fairly common, so as multicore becomes common there
may be a great deal of resistance to the current scheme, let alone
larger cache lines.  Perhaps break each cache line up into smaller
units will solve this (maybe that's what the P4's "sectors" are.)

> Newer CPUs with linux have details in /proc/cpuinfo, maybe because the
> CPUID instruction contains this information so its available to userspace.
>
>
> > Incorporating the cache line size into the semantics would effectively
> > create a new architecture (and the cache line size would have to be
> > permanently fixed for that architecture, preventing future changes to
> > the cache line size that may improve performance.)
>
> There is architecture and there is architecture.  True the electronics
> need to be worked out, but as a programmer who case about that.  When
> you buy your next system it maybe different to your last.  As a
> programmer you just want to know what size it is to build in padding
> (reorder fields) at those spots where it is needed.
>
>
> > So I do still think that, for the purposes of C at least, it is set in
> > stone.  The most that might happen is for sizeof(void *) to be the
> > official size and alignment, not sizeof(int).
>
> Yes I agree (void *) over (int), since some 64bit capable systems still
> keep (long) as 32bits, but (void *) always gives away the host's natural
> width from which everything else is derived from.

Hrm yes.  Of course, there's no existing integer type that may match
whatever rule they come up with, but in my case I've already got a
sufficiently-portable typedef.  Not the end of the world I suppose.

-- 
Adam Olsen, aka Rhamphoryncus