This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC performance regression - its memset !
On Tue, Apr 23, 2002 at 11:25:40AM +0200, Jan Hubicka wrote:
> I guess the inlining threshold is too low or the default memset
> implementation too lame. I was tunning it for Athlon, so the
> mileage may warry from CPU to CPU. I will investigate the
> misscompilation first and check this second.
> Concerning the inlining, gcc inlines all memcpys with size smaller
> than 64 bytes. Perhaps this should be extended to 128 bytes in case
> we are still about 2 times as bad. This is partly due to lame
> implementation of memset in glibc too :(
When gcc does the inlining, performance seems to not be so bad. There
is probably still some untapped performance though, as some of the
initial and final alignment checks could be ommited when gcc already
knows about the alignment of the memory zone (like in my test case, it
was an array of shorts in the data segment, so it was known to be on a
two-byte boundary at least). But might be hard to code into gcc, I
dont know.
Also as I've been only giving bad news up to now, I wanted to say that
now that I've worked around the two issues I had with inlining and
with memset, the 3.1 snapshot does provide superior performance on my
libmpeg2 codebase, about 5% faster than 2.95.4, and that gets up to 8%
when using -fbranch-probabilities and 9% when using -mcpu=athlon-tbird
instead of the more generic -mcpu=pentiumpro. Nice work guys ! I am
still worried though, that other people will have the same trouble
with inlining as I did and not see all of the performance improvements
as a result.
Cheers,
--
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.