This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimization for Newer Processors


Brian McGrew wrote:
> Top of the morning to y¹all!
> 
> I¹m a bit curious as to what optimization flags are in the newest compilers
> and how they¹d work with the newest CPU¹s versus the last generation of
> CPU¹s.
> 
> Our older systems we Dell Precision T5400 workstations with dual Intel Xeon
> 5420 CPU¹s at 2.33GHz with 6MB of cache per core.  The cache breaks out to
> be 32k of L1 cache and 6MB of L2 cache.
> 
> Now, we¹re getting Dell Precision T5500 workstations with dual Intel Xeon
> 5506 CPU¹s at 2.21GHz with 4MB of cache per core.  But, the cache breaks
> down as 32k L1 cache, 256k L2 cache and 4096k L3 cache.
> 
> Out application is very processor and disk I/O intensive and it runs about
> 6x slower on the newer hardware versus the old.  We¹re currently compiling
> with gcc-4.1.1 using the following optimization flags on Fedora Core 5 using
> a 2.6.16.16 kernel.  As it happens, the code runs seemlessly on CentOS 5.2
> with a 2.6.18 kernel as well.  Upgrading compilers, if there is a compelling
> reason is an option for us.  Upgrading kernels, at this time is not an
> option because of 3rd party hardware support.
> 
Basically, you seem to be saying that the old scheduler doesn't work well
for you, while the newer one is OK. That's not under the control of gcc.
You didn't say whether you set the NUMA option in BIOS.  If you wish to
run with NUMA, so as to get the advantage of local memory access, and want
high level affinity control from gcc, you might upgrade to a gcc version
which supports libgomp, use OpenMP directives in the important code
sections, and set the GOMP_CPU_AFFINITY.
Your description of the cache is contradictory.  The last level cache on
the older CPU is shared between 2 cores, and on the newer one it's shared
among 4 cores, unless you have the rare entry level model with only 2
cores.  Do you really have such an extremely small mid level cache?  That
seems like a handicap; you seem to be comparing a top of the line CPU of 2
years ago against a new bottom of the line.  I don't know why anyone would
choose a dual socket dual core over single socket quad core with the newer
model.
For the newer CPU model, if auto-vectorization is useful for your
application, -mtune=barcelona would be useful.  -msse4 would likely be
useful on the older one as well.  You would need a current gcc for these
features, and you may need a more current binutils.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]