Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem

Tim Prince
Thu Feb 11 18:31:00 GMT 2010

On 2/11/2010 6:57 AM, Tim Prince wrote:
> On 2/11/2010 6:35 AM, Edwin Bennink wrote:
>> Thanks Tim, I thought that the gcc list was the most appropriate one 
>> regarding the gomp implementation, but I'll post this question on the 
>> gcc-help list.
In order to assist rapid replacement of parallel regions, such as your 
example, most OpenMP libraries have a latency setting before a parallel 
region is wound down automatically.  Then, when the regions are 
identical, some of the setup is skipped, and private data may be 
inherited.  It still doesn't perform as well as retaining a single 
parallel region.
For the intel library, the default latency KMP_BLOCKTIME is 0.2 second, 
which some people consider far too small, and others too large.  I don't 
immediately see an equivalent for this setting in libgomp manual.

