This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used
- From: "alexander.nesterovskiy at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 30 Jan 2018 14:07:46 +0000
- Subject: [Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used
- Auto-submitted: auto-generated
- References: <bug-82604-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
--- Comment #20 from Alexander Nesterovskiy <alexander.nesterovskiy at intel dot com> ---
I've made test runs on Broadwell and Skylake, RHEL 7.3.
410.bwaves became faster after r256990 but not as fast as it was on r253678.
Comparing 410.bwaves performance, "-Ofast -funroll-loops -flto
-ftree-parallelize-loops=4":
rev perf. relative to r253678, %
r253678 100%
r253679 54%
...
r256989 54%
r256990 71%
CPU time distribution became more flat (~34% thread0, ~22% - threads1-3), but a
lot of time is spent spinning in
libgomp.so.1.0.0/gomp_barrier_wait_end -> do_wait -> do_spin
and
libgomp.so.1.0.0/gomp_team_barrier_wait_end -> do_wait -> do_spin
r253678 spin time is ~10% of CPU time
r256990 spin time is ~30% of CPU time