How to get GCC on par with ICC?

Szabolcs Nagy szabolcs.nagy@arm.com
Fri Jun 22 22:41:00 GMT 2018


On 11/06/18 11:05, Martin Jambor wrote:
>> The int rate numbers (running 1 copy only) were not too bad, GCC was
>> only about 2% slower and only 525.x264_r seemed way slower with GCC.
>> The fp rate numbers (again only 1 copy) showed a larger difference,
>> around 20%.  521.wrf_r was more than twice as slow when compiled with
>> GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed
>> significant slowdowns when compiled with GCC vs. ICC.
>>
> 
> Keep in mind that when discussing FP benchmarks, the used math library
> can be (almost) as important as the compiler.  In the case of 481.wrf,
> we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
> performance is about 70% of ICC's.  When we just linked against AMD's
> libm, we got to 83%. When we instructed GCC to generate calls to Intel's
> SVML library and linked against it, we got to 91%.  Using both SVML and
> AMD's libm, we achieved 93%.
> 

i think glibc 2.27 should outperform amd's libm on wrf
(since i upstreamed the single precision code from
https://github.com/ARM-software/optimized-routines/ )

the 83% -> 93% diff is because gcc fails to vectorize
math calls in fortran to libmvec calls.

> That means that there likely still is 7% to be gained from more clever
> optimizations in GCC but the real problem is in GNU libm.  And 481.wrf
> is perhaps the most extreme example but definitely not the only one.

there is no longer a problem in gnu libm for the most
common single precision calls and if things go well
then glibc 2.28 will get double precision improvements
too.

but gcc has to learn how to use libmvec in fortran.



More information about the Gcc mailing list