This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc will become the best optimizing x86 compiler

From: "Denys Vlasenko" <vda dot linux at googlemail dot com>
To: "Agner Fog" <agner at agner dot org>
Cc: "Raksit Ashok" <raksit at google dot com>, dclarke at opensolaris dot org, gcc at gcc dot gnu dot org, TimothyPrince at sbcglobal dot net
Date: Wed, 30 Jul 2008 18:08:31 +0200
Subject: Re: gcc will become the best optimizing x86 compiler
References: <2E073B3ABB3F664DBA1D1C4D5FB47EF40EBDAD8E@NT-IRVA-0752.brcm.ad.broadcom.com> <4887592E.4040804@agner.org> <a6265da20807231908h106c44a0s6271c09152f92ce3@mail.gmail.com> <4888375A.30601@agner.org> <9718380a0807241009m563e8da1x2983145a93e3ce7b@mail.gmail.com> <48897BFA.8000008@agner.org> <1158166a0807300857y3edad25cyd3b4731e2d8e2073@mail.gmail.com>

On Wed, Jul 30, 2008 at 5:57 PM, Denys Vlasenko
<vda.linux@googlemail.com> wrote:
> On Fri, Jul 25, 2008 at 9:08 AM, Agner Fog <agner@agner.org> wrote:
>> Raksit Ashok wrote:
>>>There is a more optimized version for 64-bit:
>>>http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s
>>>I think this looks similar to your implementation, Agner.
>>
>> Yes it is similar to my code.
>
> 3164 line source file which implements memcpy().
> You got to be kidding.
> How much of L1 icache it blows away in the process?
> I bet it performs wonderfully on microbenchmarks though.
>
>   2991                 .balign 16               # sadistic alignment strikes again
>   2992 L(bkPxQx):      .int L(bkP0Q0)-L(bkPxQx) # why use two bytes when
> we can use four?
>
> Seriously. What possible reason there can be to align
> a randomly accessed data table to 16 bytes?
> 4 bytes I understand, but 16?

I'm afraid I sounded a bit confrontational above, here comes the
clarification. I have nothing against making code faster.
But there should be some balance between -O999 mindset
and -Os midset. If you just found a tweak which gives you 1.2%
speedup in microbencmark but code grew 4 times bigger, *stop*.
Think about it.

"We unrolled the loop two gazillion times and it's 3% faster now"
is a similarly bad idea.

I must admit that I didn't look too closely at
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s
but at the first glance it sure looks like someone
got carried away a bit.
--
vda

Follow-Ups:
- Re: gcc will become the best optimizing x86 compiler
  - From: Agner Fog

References:
- Is cross-section inlining valid behaviour?
  - From: Bingfeng Mei
- gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Dennis Clarke
- Re: gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Raksit Ashok
- Re: gcc will become the best optimizing x86 compiler
  - From: Agner Fog
- Re: gcc will become the best optimizing x86 compiler
  - From: Denys Vlasenko

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]