This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libgomp/80822] New: libgomp incorrect affinity when OMP_PLACES=threads
- From: "weeks at iastate dot edu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 19 May 2017 00:40:03 +0000
- Subject: [Bug libgomp/80822] New: libgomp incorrect affinity when OMP_PLACES=threads
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80822
Bug ID: 80822
Summary: libgomp incorrect affinity when OMP_PLACES=threads
Product: gcc
Version: 6.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: weeks at iastate dot edu
CC: jakub at gcc dot gnu.org
Target Milestone: ---
Created attachment 41385
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41385&action=edit
xthi.c from Cray, Inc. modified to remove MPI code
On the NERSC Cori system, the Haswell nodes have two Intel Xeon E5-2698 v3
processors, each with 16 CPU cores with HyperThreading enabled. Using
OMP_PLACES=threads, libgomp from gcc 6.3.0 appears to mistakenly assume that
CPU (hardware thread) 0 and 1 share the same core, while in reality 0 and 32
are on the same core, etc.
To illustrate, attached (xthi-omp.c) is a version of xthi.c from the "Cray XC
Series User Application Placement Guide (CLE 6.0.UP01) S-2496"
(https://pubs.cray.com/content/00330629-DC/FA00256413) that has been modified
to remove the MPI code. The output of en MPI 1.10.2 "lstopo --of console"
command (lstopo.out) that shows the processor topology is at the bottom of this
text.
In the first example (OMP_NUM_THREADS=32 OMP_PLACES=threads
OMP_PROC_BIND=spread), CPU cores 0, 2, 4, ..., 30 each have two OpenMP threads,
while CPU cores 1,3,...,31 have none:
======================================================================
$ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,32
$ gcc --version
gcc (GCC) 6.3.0 20161221 (Cray Inc.)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -fopenmp -o xthi-omp.x xthi-omp.c
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=spread ./xthi-omp.x |
sort -k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 2)
Hello from thread 2, on nid00009. (core affinity = 4)
Hello from thread 3, on nid00009. (core affinity = 6)
Hello from thread 4, on nid00009. (core affinity = 8)
Hello from thread 5, on nid00009. (core affinity = 10)
Hello from thread 6, on nid00009. (core affinity = 12)
Hello from thread 7, on nid00009. (core affinity = 14)
Hello from thread 8, on nid00009. (core affinity = 16)
Hello from thread 9, on nid00009. (core affinity = 18)
Hello from thread 10, on nid00009. (core affinity = 20)
Hello from thread 11, on nid00009. (core affinity = 22)
Hello from thread 12, on nid00009. (core affinity = 24)
Hello from thread 13, on nid00009. (core affinity = 26)
Hello from thread 14, on nid00009. (core affinity = 28)
Hello from thread 15, on nid00009. (core affinity = 30)
Hello from thread 16, on nid00009. (core affinity = 32)
Hello from thread 17, on nid00009. (core affinity = 34)
Hello from thread 18, on nid00009. (core affinity = 36)
Hello from thread 19, on nid00009. (core affinity = 38)
Hello from thread 20, on nid00009. (core affinity = 40)
Hello from thread 21, on nid00009. (core affinity = 42)
Hello from thread 22, on nid00009. (core affinity = 44)
Hello from thread 23, on nid00009. (core affinity = 46)
Hello from thread 24, on nid00009. (core affinity = 48)
Hello from thread 25, on nid00009. (core affinity = 50)
Hello from thread 26, on nid00009. (core affinity = 52)
Hello from thread 27, on nid00009. (core affinity = 54)
Hello from thread 28, on nid00009. (core affinity = 56)
Hello from thread 29, on nid00009. (core affinity = 58)
Hello from thread 30, on nid00009. (core affinity = 60)
Hello from thread 31, on nid00009. (core affinity = 62)
======================================================================
In the second example, OMP_PROC_BIND=close results in 1 OpenMP thread per core,
opposite of the intended effect:
======================================================================
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.x | sort
-k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 1)
Hello from thread 2, on nid00009. (core affinity = 2)
Hello from thread 3, on nid00009. (core affinity = 3)
Hello from thread 4, on nid00009. (core affinity = 4)
Hello from thread 5, on nid00009. (core affinity = 5)
Hello from thread 6, on nid00009. (core affinity = 6)
Hello from thread 7, on nid00009. (core affinity = 7)
Hello from thread 8, on nid00009. (core affinity = 8)
Hello from thread 9, on nid00009. (core affinity = 9)
Hello from thread 10, on nid00009. (core affinity = 10)
Hello from thread 11, on nid00009. (core affinity = 11)
Hello from thread 12, on nid00009. (core affinity = 12)
Hello from thread 13, on nid00009. (core affinity = 13)
Hello from thread 14, on nid00009. (core affinity = 14)
Hello from thread 15, on nid00009. (core affinity = 15)
Hello from thread 16, on nid00009. (core affinity = 16)
Hello from thread 17, on nid00009. (core affinity = 17)
Hello from thread 18, on nid00009. (core affinity = 18)
Hello from thread 19, on nid00009. (core affinity = 19)
Hello from thread 20, on nid00009. (core affinity = 20)
Hello from thread 21, on nid00009. (core affinity = 21)
Hello from thread 22, on nid00009. (core affinity = 22)
Hello from thread 23, on nid00009. (core affinity = 23)
Hello from thread 24, on nid00009. (core affinity = 24)
Hello from thread 25, on nid00009. (core affinity = 25)
Hello from thread 26, on nid00009. (core affinity = 26)
Hello from thread 27, on nid00009. (core affinity = 27)
Hello from thread 28, on nid00009. (core affinity = 28)
Hello from thread 29, on nid00009. (core affinity = 29)
Hello from thread 30, on nid00009. (core affinity = 30)
Hello from thread 31, on nid00009. (core affinity = 31)
======================================================================
The Intel 17.0.2 OpenMP runtime uses the correct affinity in both cases:
======================================================================
$ icc --version
icc (ICC) 17.0.2 20170213
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
$ icc -qopenmp -o ./xthi-omp.x xthi-omp.c
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=spread ./xthi-omp.x |
sort -k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 1)
Hello from thread 2, on nid00009. (core affinity = 2)
Hello from thread 3, on nid00009. (core affinity = 3)
Hello from thread 4, on nid00009. (core affinity = 4)
Hello from thread 5, on nid00009. (core affinity = 5)
Hello from thread 6, on nid00009. (core affinity = 6)
Hello from thread 7, on nid00009. (core affinity = 7)
Hello from thread 8, on nid00009. (core affinity = 8)
Hello from thread 9, on nid00009. (core affinity = 9)
Hello from thread 10, on nid00009. (core affinity = 10)
Hello from thread 11, on nid00009. (core affinity = 11)
Hello from thread 12, on nid00009. (core affinity = 12)
Hello from thread 13, on nid00009. (core affinity = 13)
Hello from thread 14, on nid00009. (core affinity = 14)
Hello from thread 15, on nid00009. (core affinity = 15)
Hello from thread 16, on nid00009. (core affinity = 16)
Hello from thread 17, on nid00009. (core affinity = 17)
Hello from thread 18, on nid00009. (core affinity = 18)
Hello from thread 19, on nid00009. (core affinity = 19)
Hello from thread 20, on nid00009. (core affinity = 20)
Hello from thread 21, on nid00009. (core affinity = 21)
Hello from thread 22, on nid00009. (core affinity = 22)
Hello from thread 23, on nid00009. (core affinity = 23)
Hello from thread 24, on nid00009. (core affinity = 24)
Hello from thread 25, on nid00009. (core affinity = 25)
Hello from thread 26, on nid00009. (core affinity = 26)
Hello from thread 27, on nid00009. (core affinity = 27)
Hello from thread 28, on nid00009. (core affinity = 28)
Hello from thread 29, on nid00009. (core affinity = 29)
Hello from thread 30, on nid00009. (core affinity = 30)
Hello from thread 31, on nid00009. (core affinity = 31)
$ OMP_NUM_THREADS=32 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.x | sort
-k 4n,4n
Hello from thread 0, on nid00009. (core affinity = 0)
Hello from thread 1, on nid00009. (core affinity = 32)
Hello from thread 2, on nid00009. (core affinity = 1)
Hello from thread 3, on nid00009. (core affinity = 33)
Hello from thread 4, on nid00009. (core affinity = 2)
Hello from thread 5, on nid00009. (core affinity = 34)
Hello from thread 6, on nid00009. (core affinity = 3)
Hello from thread 7, on nid00009. (core affinity = 35)
Hello from thread 8, on nid00009. (core affinity = 4)
Hello from thread 9, on nid00009. (core affinity = 36)
Hello from thread 10, on nid00009. (core affinity = 5)
Hello from thread 11, on nid00009. (core affinity = 37)
Hello from thread 12, on nid00009. (core affinity = 6)
Hello from thread 13, on nid00009. (core affinity = 38)
Hello from thread 14, on nid00009. (core affinity = 7)
Hello from thread 15, on nid00009. (core affinity = 39)
Hello from thread 16, on nid00009. (core affinity = 8)
Hello from thread 17, on nid00009. (core affinity = 40)
Hello from thread 18, on nid00009. (core affinity = 9)
Hello from thread 19, on nid00009. (core affinity = 41)
Hello from thread 20, on nid00009. (core affinity = 10)
Hello from thread 21, on nid00009. (core affinity = 42)
Hello from thread 22, on nid00009. (core affinity = 11)
Hello from thread 23, on nid00009. (core affinity = 43)
Hello from thread 24, on nid00009. (core affinity = 12)
Hello from thread 25, on nid00009. (core affinity = 44)
Hello from thread 26, on nid00009. (core affinity = 13)
Hello from thread 27, on nid00009. (core affinity = 45)
Hello from thread 28, on nid00009. (core affinity = 14)
Hello from thread 29, on nid00009. (core affinity = 46)
Hello from thread 30, on nid00009. (core affinity = 15)
Hello from thread 31, on nid00009. (core affinity = 47)
======================================================================
Output of "lstopo --of console":
======================================================================
Machine (126GB total)
NUMANode L#0 (P#0 63GB) + Package L#0 + L3 L#0 (40MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#32)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#33)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#34)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#35)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#36)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#37)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#38)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#39)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
PU L#16 (P#8)
PU L#17 (P#40)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
PU L#18 (P#9)
PU L#19 (P#41)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
PU L#20 (P#10)
PU L#21 (P#42)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
PU L#22 (P#11)
PU L#23 (P#43)
L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
PU L#24 (P#12)
PU L#25 (P#44)
L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
PU L#26 (P#13)
PU L#27 (P#45)
L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
PU L#28 (P#14)
PU L#29 (P#46)
L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
PU L#30 (P#15)
PU L#31 (P#47)
NUMANode L#1 (P#1 63GB) + Package L#1 + L3 L#1 (40MB)
L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
PU L#32 (P#16)
PU L#33 (P#48)
L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
PU L#34 (P#17)
PU L#35 (P#49)
L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
PU L#36 (P#18)
PU L#37 (P#50)
L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
PU L#38 (P#19)
PU L#39 (P#51)
L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
PU L#40 (P#20)
PU L#41 (P#52)
L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
PU L#42 (P#21)
PU L#43 (P#53)
L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
PU L#44 (P#22)
PU L#45 (P#54)
L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
PU L#46 (P#23)
PU L#47 (P#55)
L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
PU L#48 (P#24)
PU L#49 (P#56)
L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
PU L#50 (P#25)
PU L#51 (P#57)
L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
PU L#52 (P#26)
PU L#53 (P#58)
L2 L#27 (256KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
PU L#54 (P#27)
PU L#55 (P#59)
L2 L#28 (256KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28
PU L#56 (P#28)
PU L#57 (P#60)
L2 L#29 (256KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29
PU L#58 (P#29)
PU L#59 (P#61)
L2 L#30 (256KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30
PU L#60 (P#30)
PU L#61 (P#62)
L2 L#31 (256KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31
PU L#62 (P#31)
PU L#63 (P#63)
======================================================================