This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc will become the best optimizing x86 compiler

From: Agner Fog <agner at agner dot org>
To: Michael Meissner <gnu at the-meissners dot org>, Agner Fog <agner at agner dot org>, Raksit Ashok <raksit at google dot com>, dclarke at opensolaris dot org, gcc at gcc dot gnu dot org, TimothyPrince at sbcglobal dot net
Date: Sat, 26 Jul 2008 10:17:28 +0200
Subject: Re: gcc will become the best optimizing x86 compiler
References: <2E073B3ABB3F664DBA1D1C4D5FB47EF40EBDAD8E@NT-IRVA-0752.brcm.ad.broadcom.com> <4887592E.4040804@agner.org> <a6265da20807231908h106c44a0s6271c09152f92ce3@mail.gmail.com> <4888375A.30601@agner.org> <9718380a0807241009m563e8da1x2983145a93e3ce7b@mail.gmail.com> <48897BFA.8000008@agner.org> <20080725220958.GA4900@tiktok.the-meissners.org>

Michael Meissner wrote:

On Fri, Jul 25, 2008 at 09:08:42AM +0200, Agner Fog wrote:
Gnu libc could borrow a lot of optimized functions from Opensolaris and Mac and other open source projects. They look better than Gnu libc, but there is still room for improvement. For example, Opensolaris does not use XMM registers for strlen, although this is simpler than using general purpose registers (see my code www.agner.org/optimize/asmlib.zip)
Note, glibc can only take code that is appropriately licensed and donated to the FSF. In addition it must meet the coding standards for glibc.

The Mac/Xnu and Opensolaris projects have fairly liberal public licenses. If there are legal differences, maybe the copyright owner is open to negotiation. My own code has GPL license. The fact that I am offering my code to you also means, of course, that I am willing to grant the necessary license.

Also note, that it depends on the basic chip level what is fastest for the operation (for example, using XMM registers are not faster for current AMD platforms).

Indeed. That's why I am talking about CPU dispatching (i.e. different branches for different CPUs). The CPU dispatching can be done with just a single jump instruction: At the function entry there is an indirect jump through a pointer to the appropriate version. The code pointer initially points to a CPU dispatcher. The CPU dispatcher detects which CPU it is running on, and replaces the code pointer with a pointer to the appropriate version, then jumps to the pointer. The next time the function is called, it follows the pointer directly to the right version.

My memcpy runs faster with XMM registers than with 64-bit x64 registers on AMD K8. My strlen runs slower with XMM registers than with 64-bit x64 registers on AMD K8.

I expect the XMM versions to run much faster on AMD K10, because it has full 128-bit execution units and data paths, where K8 has only 64-bits. I have not had the chance to test this on AMD K10 yet.

I believe it is best to optimize for the newest processors, because the processor that is brand new today will become mainstream in a few years.

Memcpy/memset optimizations were added to glibc 2.8, though when your favorite distribution will provide it is a different question: http://sourceware.org/ml/libc-alpha/2008-04/msg00050.html

I have libc version 2.7. Can't find version 2.8.

References:
- Is cross-section inlining valid behaviour?
  - From: Bingfeng Mei
- gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Dennis Clarke
- Re: gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Raksit Ashok
- Re: gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Michael Meissner

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]