This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Performance optimizations for Intel Core 2 and Core i7 processors
On May 20, 2010, at 8:04 AM, Steven Bosscher wrote:
> On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and Core
>> i7 families of processors.
>>
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction selection
>> and instruction cost models for Core i7 and Core 2 families of processors.
>>
>> As usual, CodeSourcery will be contributing its work to GCC. Currently, our
>> target is the end of GCC 4.6 Stage1.
>>
>> If your favorite benchmark significantly under-performs on Core 2 or Core i7
>> CPUs, don't hesitate asking us to take a look at it.
>
> I'd like to ask you to look at ffmpeg (missed core2 vectorization
> opportunities), polyhedron (PR34501, like, duh! :-), and Apache
> benchmark (-mtune=core2 results in lower scores).
>
> You could check overall effects on an openly available benchmark suite
> such as http://www.phoronix-test-suite.com/
>
> Good luck with this project, it'll be great when -mtune=core2 actually
> improves performance rather than degrading it!
>
> Ciao!
> Steven
ffmpeg builds with -fno-tree-vectorize - there was some miscompilation with it on PPC and the maintainer is too shy to file compiler bugs about it - and that probably won't change. But it's still worth looking at, since it might improve other programs.
Some numbers decoding H264 on Core i5 x86-64:
asm on: 8.78s
asm off (./configure --disable-asm): 15.61s
asm off + -ftree-vectorize -ftree-slp-vectorize -fstrict-aliasing: 14.84s
So there's a lot of room there.
I haven't investigated, but I guess some useful missing features are small-vector vectorization using MMX (ffmpeg uses it everywhere) and scalar write-combining (http://x264dev.multimedia.cx/?p=32). And better scheduling/shorter code in general.