This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libgomp/79784] New: Synchronization overhead is thrashing on Aarch64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79784

            Bug ID: 79784
           Summary: Synchronization overhead is thrashing on Aarch64
           Product: gcc
           Version: 7.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cbz at baozis dot org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

I have recently been running several programs (caffe, mxnet, openblas, =
blis...) on aarch64.  And I found performance regression when libgomp (gcc
implementation of OpenMP) is used and OMP_NUM_THREADS is set to be >2. Almost
half of the execution time is consumed either in gomp_barrier_wait_end() or
gomp_team_barrier_wait_end(). Then I run EPCC OpenMP micro-benchmark suite to
get the overhead of synchronization mechanism of GOMP on Aarch64. And it looks
pretty bad. The PARALLEL overhead varies from ~1ms to ~2000ms.

I used linux perf to analyze hot spots of the program. And I find most of
execution time is taken in the loop of barrier waiting for other threads to
synchronize. In gomp_barrier_wait_end(), it is the following section:

     do
       do_wait ((int *) &bar->generation, state);
     while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) == state);

I'm not quite sure whether it is a known issue on Aarch64. If so, is there any
way to fix it?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]