This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance optimizations for Intel Core 2 and Core i7 processors


On May 20, 2010, at 8:04 AM, Steven Bosscher wrote:

> On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and Core
>> i7 families of processors.
>> 
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction selection
>> and instruction cost models for Core i7 and Core 2 families of processors.
>> 
>> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
>> target is the end of GCC 4.6 Stage1.
>> 
>> If your favorite benchmark significantly under-performs on Core 2 or Core i7
>> CPUs, don't hesitate asking us to take a look at it.
> 
> I'd like to ask you to look at ffmpeg (missed core2 vectorization
> opportunities), polyhedron (PR34501, like, duh! :-), and Apache
> benchmark (-mtune=core2 results in lower scores).
> 
> You could check overall effects on an openly available benchmark suite
> such as http://www.phoronix-test-suite.com/
> 
> Good luck with this project, it'll be great when -mtune=core2 actually
> improves performance rather than degrading it!
> 
> Ciao!
> Steven

ffmpeg builds with -fno-tree-vectorize - there was some miscompilation with it on PPC and the maintainer is too shy to file compiler bugs about it - and that probably won't change. But it's still worth looking at, since it might improve other programs.

Some numbers decoding H264 on Core i5 x86-64:
asm on: 8.78s
asm off (./configure --disable-asm): 15.61s
asm off + -ftree-vectorize -ftree-slp-vectorize -fstrict-aliasing: 14.84s

So there's a lot of room there.

I haven't investigated, but I guess some useful missing features are small-vector vectorization using MMX (ffmpeg uses it everywhere) and scalar write-combining (http://x264dev.multimedia.cx/?p=32). And better scheduling/shorter code in general.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]