This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: SUSv3's "memory location" and threads

From: Darryl Miles <darryl-mailinglists at netbauds dot net>
To: Adam Olsen <rhamph at gmail dot com>
Cc: Ian Lance Taylor <iant at google dot com>, gcc-help at gcc dot gnu dot org
Date: Tue, 28 Aug 2007 22:33:46 +0100
Subject: Re: SUSv3's "memory location" and threads
References: <aac2c7cb0708272219i2ce775c4r460970da2717f9b5@mail.gmail.com> <m34piki5c2.fsf@dhcp-172-18-118-196.corp.google.com> <aac2c7cb0708272304h41518789h2abf76046fcd2f53@mail.gmail.com> <m3zm0cgnhj.fsf@dhcp-172-18-118-196.corp.google.com> <aac2c7cb0708280952y5ad0965dycaab02727a0dcb04@mail.gmail.com> <m3hcmjh7xw.fsf@dhcp-172-18-118-196.corp.google.com> <aac2c7cb0708281124x71d3d29di380de7a14ca930fd@mail.gmail.com>

Adam Olsen wrote:

From what I've seen (at least on x86), cache line size only affects

performance, not semantics.  If two threads write to different parts
of a cache line they get "false sharing".  The writes themselves are
still only a small portion of each cache line.

True but its the contention issue which most people are interested in such research. You may want 2 CPUS to work in close proximity (memory wise) but not so close that performance is affected as people care about performance more than semantics. You can't change semantics you just have to work with them, where as performance is an never ending goal.

When one CPU is accessing memory the other CPUs will be snooping its memory bus to invalidate any writes to it from their upstream (closer proximity to CPU) caches.

When one CPU is then using atomic assembly instructions (for example byte-wise atomic exchange on IA32 to implement spinlocks) then that entire cache line is busied out while the instruction takes place. This would affect memory access performance to that cache-line by any other CPU in the system.

It is my guess that this is the issues of any research to find out the rules of the game and the tradeoffs for a number of computing scenarios.

I'm not so sure on your term "false sharing", from the CPUs point of view nothing appears false. Reads and writes to RAM beyond the last cache maybe bursted. I you want a single byte, you get a cache-line anyway. When you modify a byte the entire cache-line maybe written back to RAM (including the unchanged bytes around the single byte you changed).

This has the same kind of metrics as accessing a single byte from disk when the minimum transfer unit is a 512-byte sector.

In future one way to speed up RAM access maybe to make the RAM bus even wider so getting used to cache-lines now might be considered ground work in this direction.

Changing this would have far reaching effects.  malloc for instance
would have to internally align blocks on 64 byte boundaries (or
whatever the local cache line size is).  In fact, the cache line size
varies from cpu to cpu, or even within a cpu (L1 vs L2 vs L3).

Not really inside one hardware implementation, that makes no sense (but clearly a practical possibility with NUMA). That is I believe L1, L2 and L3 of the same physical system will have a common cache-line size. But if you buy another host with the same CPU type might have a different cache-line size. There are not too many differing cache-line sizes to be concerned about and its really a case of do we perform well on the largest cache-line size we're going to execute on.

Newer CPUs with linux have details in /proc/cpuinfo, maybe because the CPUID instruction contains this information so its available to userspace.

Incorporating the cache line size into the semantics would effectively
create a new architecture (and the cache line size would have to be
permanently fixed for that architecture, preventing future changes to
the cache line size that may improve performance.)

There is architecture and there is architecture. True the electronics need to be worked out, but as a programmer who case about that. When you buy your next system it maybe different to your last. As a programmer you just want to know what size it is to build in padding (reorder fields) at those spots where it is needed.

So I do still think that, for the purposes of C at least, it is set in
stone.  The most that might happen is for sizeof(void *) to be the
official size and alignment, not sizeof(int).

Yes I agree (void *) over (int), since some 64bit capable systems still keep (long) as 32bits, but (void *) always gives away the host's natural width from which everything else is derived from.

Are there a GNUC directive you can place inside your struct to provide packing rules (fictitious example for i386) :

struct mystruct { char c; __align__(16); /* A directive here to cause the next declared field to be aligned on this boundary */ short s; __padding__(2); /* simple way to insert bytes of padding */ char last_byte; __align__(4); /* forces padding to end to align sizeof(struct mystruct) */ };

sizeof(struct mystruct) = 24;
offsetof(&mystruct.c) = 0;
offsetof(&mystruct.s) = 16;
offsetof(&mystruct.last_byte) = 20;

Darryl

Follow-Ups:
- Re: SUSv3's "memory location" and threads
  - From: Adam Olsen

References:
- SUSv3's "memory location" and threads
  - From: Adam Olsen
- Re: SUSv3's "memory location" and threads
  - From: Ian Lance Taylor
- Re: SUSv3's "memory location" and threads
  - From: Adam Olsen
- Re: SUSv3's "memory location" and threads
  - From: Ian Lance Taylor
- Re: SUSv3's "memory location" and threads
  - From: Adam Olsen
- Re: SUSv3's "memory location" and threads
  - From: Ian Lance Taylor
- Re: SUSv3's "memory location" and threads
  - From: Adam Olsen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]