This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: How to get GCC on par with ICC?


Hi Steve,

On Fri, Jun 08 2018, Steve Ellcey wrote:
> On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote:
>> 
>> When we do our own comparisons of GCC vs. ICC on benchmarks
>> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC
>> (in fact it even trails in some benchmarks) unless you get to
>> "SPEC tricks" like data structure re-organization optimizations that
>> probably never apply in practice on real-world code (and people
>> should fix such things at the source level being pointed at them
>> via actually profiling their codes).
>
> Richard,
>
> I was wondering if you have any more details about these comparisions
> you have done that you can share?  Compiler versions, options used,
> hardware, etc  Also, were there any tests that stood out in terms of
> icc outperforming GCC?

Mostly AMD Ryzen, GCC 8 vs ICC 18.  We were comparing a few combinations
of options.  When we compared ICC's and our -Ofast (with or without
native GCC march/mtune and a set ICC options that hopefully generate
best code on for Ryzen), we found out that without LTO/IPO, GCC is
actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and
2017).

Floating-point results were a more mixed bag (mostly because ICC
performed surprisingly poorly without IPO on a few) but at least on SPEC
2017, they were clearly better... with a caveat, see below my comment
about wrf.

With LTO/IPO, ICC can perform a few memory-reorg tricks that push them
quite a bit ahead of us but I'm not convinced they can perform these
transformations on much source code that happens not to be a well known
benchmark.  So I'd recommend always looking at non-IPO numbers too.

>
> I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and
> a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4).
> I used '-xHost -O3' for icc and '-march=native -mtune=native -O3'
> for gcc.

Please try with -Ofast too.  The main reason is that -O3 does not imply
-ffast-math and the performance gain from it is often very big (and I
suspect the 525.x264_r difference is because of that).  Alternatively,
if your own workloads require high-precision floating-point math, you
have to force ICC to use it to get a fair comparison.  -Ofast also turns
on -fno-protect-parens and -fstack-arrays that also help a few
benchmarks a lot but note that you may need to set large stack ulimit
for them not to crash (but ICC does the same thing, as far as we know).

>
> The int rate numbers (running 1 copy only) were not too bad, GCC was
> only about 2% slower and only 525.x264_r seemed way slower with GCC.
> The fp rate numbers (again only 1 copy) showed a larger difference, 
> around 20%.  521.wrf_r was more than twice as slow when compiled with
> GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
> significant slowdowns when compiled with GCC vs. ICC.
>

Keep in mind that when discussing FP benchmarks, the used math library
can be (almost) as important as the compiler.  In the case of 481.wrf,
we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
performance is about 70% of ICC's.  When we just linked against AMD's
libm, we got to 83%. When we instructed GCC to generate calls to Intel's
SVML library and linked against it, we got to 91%.  Using both SVML and
AMD's libm, we achieved 93%.

That means that there likely still is 7% to be gained from more clever
optimizations in GCC but the real problem is in GNU libm.  And 481.wrf
is perhaps the most extreme example but definitely not the only one.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]