This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: How to get GCC on par with ICC?
Hi Steve,
On Fri, Jun 08 2018, Steve Ellcey wrote:
> On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote:
>>
>> When we do our own comparisons of GCC vs. ICC on benchmarks
>> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC
>> (in fact it even trails in some benchmarks) unless you get to
>> "SPEC tricks" like data structure re-organization optimizations that
>> probably never apply in practice on real-world code (and people
>> should fix such things at the source level being pointed at them
>> via actually profiling their codes).
>
> Richard,
>
> I was wondering if you have any more details about these comparisions
> you have done that you can share? Compiler versions, options used,
> hardware, etc Also, were there any tests that stood out in terms of
> icc outperforming GCC?
Mostly AMD Ryzen, GCC 8 vs ICC 18. We were comparing a few combinations
of options. When we compared ICC's and our -Ofast (with or without
native GCC march/mtune and a set ICC options that hopefully generate
best code on for Ryzen), we found out that without LTO/IPO, GCC is
actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and
2017).
Floating-point results were a more mixed bag (mostly because ICC
performed surprisingly poorly without IPO on a few) but at least on SPEC
2017, they were clearly better... with a caveat, see below my comment
about wrf.
With LTO/IPO, ICC can perform a few memory-reorg tricks that push them
quite a bit ahead of us but I'm not convinced they can perform these
transformations on much source code that happens not to be a well known
benchmark. So I'd recommend always looking at non-IPO numbers too.
>
> I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and
> a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4).
> I used '-xHost -O3' for icc and '-march=native -mtune=native -O3'
> for gcc.
Please try with -Ofast too. The main reason is that -O3 does not imply
-ffast-math and the performance gain from it is often very big (and I
suspect the 525.x264_r difference is because of that). Alternatively,
if your own workloads require high-precision floating-point math, you
have to force ICC to use it to get a fair comparison. -Ofast also turns
on -fno-protect-parens and -fstack-arrays that also help a few
benchmarks a lot but note that you may need to set large stack ulimit
for them not to crash (but ICC does the same thing, as far as we know).
>
> The int rate numbers (running 1 copy only) were not too bad, GCC was
> only about 2% slower and only 525.x264_r seemed way slower with GCC.
> The fp rate numbers (again only 1 copy) showed a larger difference,
> around 20%. 521.wrf_r was more than twice as slow when compiled with
> GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
> significant slowdowns when compiled with GCC vs. ICC.
>
Keep in mind that when discussing FP benchmarks, the used math library
can be (almost) as important as the compiler. In the case of 481.wrf,
we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
performance is about 70% of ICC's. When we just linked against AMD's
libm, we got to 83%. When we instructed GCC to generate calls to Intel's
SVML library and linked against it, we got to 91%. Using both SVML and
AMD's libm, we achieved 93%.
That means that there likely still is 7% to be gained from more clever
optimizations in GCC but the real problem is in GNU libm. And 481.wrf
is perhaps the most extreme example but definitely not the only one.
Martin