This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC performance regression - its memset !


> On Tue, Apr 23, 2002 at 11:25:40AM +0200, Jan Hubicka wrote:
> > I guess the inlining threshold is too low or the default memset
> > implementation too lame.  I was tunning it for Athlon, so the
> > mileage may warry from CPU to CPU.  I will investigate the
> > misscompilation first and check this second.
> 
> > Concerning the inlining, gcc inlines all memcpys with size smaller
> > than 64 bytes. Perhaps this should be extended to 128 bytes in case
> > we are still about 2 times as bad. This is partly due to lame
> > implementation of memset in glibc too :(
> 
> When gcc does the inlining, performance seems to not be so bad. There
> is probably still some untapped performance though, as some of the
> initial and final alignment checks could be ommited when gcc already
> knows about the alignment of the memory zone (like in my test case, it

When it knows, it should avoid it.  Definitly on array of shorts, the alignment
to even byte is not done.  It is dificult to make it expect that array of
shorts is 4 byte aligned, as ABI does not specify this, so it may not be.
GCC has new alignment tracking code, so it should be better than any previous
version, but still not that good.  (for instnace when array is static, it
definitly do have chance to conclude so, but it does not, however majority of
string functions come to computed addresses)

> was an array of shorts in the data segment, so it was known to be on a
> two-byte boundary at least). But might be hard to code into gcc, I
> dont know.
> 
> Also as I've been only giving bad news up to now, I wanted to say that
> now that I've worked around the two issues I had with inlining and
> with memset, the 3.1 snapshot does provide superior performance on my
> libmpeg2 codebase, about 5% faster than 2.95.4, and that gets up to 8%
> when using -fbranch-probabilities and 9% when using -mcpu=athlon-tbird

That sounds good :)
> instead of the more generic -mcpu=pentiumpro. Nice work guys ! I am
> still worried though, that other people will have the same trouble
> with inlining as I did and not see all of the performance improvements
> as a result.

I will send patch to increase the constant to 128.  I was re-benchmarking
the code and on P4/Athlon and my assembly memset, the 64 is just on the border
(ie inlined/not inlined sollution have less than 10% difference), setting
it to 128 does not make us to loose something.  For glibc implementation
128 is still a win to be inlined :(

Honza
> 
> Cheers,
> 
> -- 
> Michel "Walken" LESPINASSE
> Is this the best that god can do ? Then I'm not impressed.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]