This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: GCC performance regression - its memset !

From: Michel LESPINASSE <walken at zoy dot org>
To: Jan Hubicka <jh at suse dot cz>
Cc: Richard Henderson <rth at redhat dot com>, gcc list <gcc at gcc dot gnu dot org>
Date: Tue, 23 Apr 2002 13:35:26 -0700
Subject: Re: GCC performance regression - its memset !
References: <20020421005718.GA16378@zoy.org> <20020422213222.GA21429@zoy.org> <20020422165953.A32536@redhat.com> <20020423001045.GA26276@zoy.org> <20020423092540.GC27274@atrey.karlin.mff.cuni.cz>

On Tue, Apr 23, 2002 at 11:25:40AM +0200, Jan Hubicka wrote:
> I guess the inlining threshold is too low or the default memset
> implementation too lame.  I was tunning it for Athlon, so the
> mileage may warry from CPU to CPU.  I will investigate the
> misscompilation first and check this second.

> Concerning the inlining, gcc inlines all memcpys with size smaller
> than 64 bytes. Perhaps this should be extended to 128 bytes in case
> we are still about 2 times as bad. This is partly due to lame
> implementation of memset in glibc too :(

When gcc does the inlining, performance seems to not be so bad. There
is probably still some untapped performance though, as some of the
initial and final alignment checks could be ommited when gcc already
knows about the alignment of the memory zone (like in my test case, it
was an array of shorts in the data segment, so it was known to be on a
two-byte boundary at least). But might be hard to code into gcc, I
dont know.

Also as I've been only giving bad news up to now, I wanted to say that
now that I've worked around the two issues I had with inlining and
with memset, the 3.1 snapshot does provide superior performance on my
libmpeg2 codebase, about 5% faster than 2.95.4, and that gets up to 8%
when using -fbranch-probabilities and 9% when using -mcpu=athlon-tbird
instead of the more generic -mcpu=pentiumpro. Nice work guys ! I am
still worried though, that other people will have the same trouble
with inlining as I did and not see all of the performance improvements
as a result.

Cheers,

-- 
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.

Follow-Ups:
- Re: GCC performance regression - its memset !
  - From: Jan Hubicka
- Re: GCC performance regression - its memset !
  - From: Jan Hubicka

References:
- GCC performance regression - up to 20% ?
  - From: Michel LESPINASSE
- GCC performance regression - its memset !
  - From: Michel LESPINASSE
- Re: GCC performance regression - its memset !
  - From: Richard Henderson
- Re: GCC performance regression - its memset !
  - From: Michel LESPINASSE
- Re: GCC performance regression - its memset !
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]