This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604

--- Comment #20 from Alexander Nesterovskiy <alexander.nesterovskiy at intel dot com> ---
I've made test runs on Broadwell and Skylake, RHEL 7.3.
410.bwaves became faster after r256990 but not as fast as it was on r253678. 
Comparing 410.bwaves performance, "-Ofast -funroll-loops -flto
-ftree-parallelize-loops=4": 

rev       perf. relative to r253678, %
r253678   100%
r253679   54%
...
r256989   54%
r256990   71%

CPU time distribution became more flat (~34% thread0, ~22% - threads1-3), but a
lot of time is spent spinning in 
libgomp.so.1.0.0/gomp_barrier_wait_end -> do_wait -> do_spin
and
libgomp.so.1.0.0/gomp_team_barrier_wait_end -> do_wait -> do_spin 
r253678 spin time is ~10% of CPU time 
r256990 spin time is ~30% of CPU time

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]