This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libgomp/79784] New: Synchronization overhead is thrashing on Aarch64
- From: "cbz at baozis dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 01 Mar 2017 15:01:25 +0000
- Subject: [Bug libgomp/79784] New: Synchronization overhead is thrashing on Aarch64
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79784
Bug ID: 79784
Summary: Synchronization overhead is thrashing on Aarch64
Product: gcc
Version: 7.0.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: cbz at baozis dot org
CC: jakub at gcc dot gnu.org
Target Milestone: ---
I have recently been running several programs (caffe, mxnet, openblas, =
blis...) on aarch64. And I found performance regression when libgomp (gcc
implementation of OpenMP) is used and OMP_NUM_THREADS is set to be >2. Almost
half of the execution time is consumed either in gomp_barrier_wait_end() or
gomp_team_barrier_wait_end(). Then I run EPCC OpenMP micro-benchmark suite to
get the overhead of synchronization mechanism of GOMP on Aarch64. And it looks
pretty bad. The PARALLEL overhead varies from ~1ms to ~2000ms.
I used linux perf to analyze hot spots of the program. And I find most of
execution time is taken in the loop of barrier waiting for other threads to
synchronize. In gomp_barrier_wait_end(), it is the following section:
do
do_wait ((int *) &bar->generation, state);
while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) == state);
I'm not quite sure whether it is a known issue on Aarch64. If so, is there any
way to fix it?