This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[testresults][rfc] Re: Building for K6-2 CPU: i586 or i686?


I have done some testing with mpeg2dec* and gcc 2.95.3 to understand 
which -mcpu/-march option gives the best performance on my AMD K6-2.
I present my findings here, for comment and abuse.

I choose mpeg2dec because it does limited io (*cough*, well, 1.6 MB/s here), 
it is 'real' code, and because it produces some statistics at the end. 
It uses both mmx and 3dnow. Also, in it's current version it will process 
the input mpeg file as fast as possible. In addition, it can be made to use 
a dummy output module, so your gfx hardware isn't a limiting factor.

The results are very clear, at least for this particular code: 
if you can't build for -march=k6, you should go for -mcpu=i686, 
and clearly not -mcpu=i586. 
The sad thing is the many (most?) sourcepackages fail to detect a k6
through auto*-tools and end up optimizing for a i586...

I built 4 versions of mpeg2dec, with -mcpu=pentium, -mcpu=pentiumpro, 
-mcpu=k6 and -march=k6. mpeg2dec links to libm, libc and libpthread 
(among others), but I don't think it spends most time in those libraries 
anyway. These libraries were not recompiled.

I am unsure how/if gcc scheduling is affected by the 3dnow portion of the 
mpeg2dec code when optimizing for i586/i686, but i586 is a lot slower 
than i686 in any case. I also note that there is no obvious gain in using
-mcpu=k6 over -mcpu=i686.

Is using 3dnow and optimizing for i686 kosher?
If it isn't, does that mean there is more optimization to gain from gcc?
Are there any obvious flaws to my testing?
(I do hope mmx/3dnow is used even with the nullslice output target...)
Is it reasonable to consider the results (i686 is better than i586 for a k6)
valid for 'common' sourcecode? (I.e. a general thing?)

Dag B
*) http://www.linuxvideo.org/mpeg2dec/



-mcpu=pentium:
> ./src/mpeg2dec.i586 -o nullslice /tmp/matrix.mpg
mpeg2dec-0.2.1-cvs (C) 2000-2001 Aaron Holtzman <aholtzma@ess.engr.uvic.ca>
Using MMX for IDCT transform
Using 3DNOW for motion compensation
3605 frames in 69.31 sec (52.01 fps), 75 last 0.52 sec (144.23 fps)
3659 frames decoded in 69.52 seconds (52.63 fps)

3659 frames decoded in 69.21 seconds (52.86 fps)
3659 frames decoded in 70.47 seconds (51.92 fps)
3659 frames decoded in 70.20 seconds (52.12 fps)
3659 frames decoded in 70.02 seconds (52.25 fps)
3659 frames decoded in 70.23 seconds (52.10 fps)


-mcpu=pentiumpro:
> ./src/mpeg2dec.i686 -o nullslice /tmp/matrix.mpg
3659 frames decoded in 65.69 seconds (55.70 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 64.14 seconds (57.04 fps)
3659 frames decoded in 65.82 seconds (55.59 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 65.38 seconds (55.96 fps)


-mcpu=k6:
> ./src/mpeg2dec.k6cpu -o nullslice /tmp/matrix.mpg
3659 frames decoded in 66.01 seconds (55.43 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 66.53 seconds (54.99 fps)


-march=k6:
> ./src/mpeg2dec.k6 -o nullslice /tmp/matrix.mpg
3659 frames decoded in 64.68 seconds (56.57 fps)
3659 frames decoded in 64.53 seconds (56.70 fps)
3659 frames decoded in 63.92 seconds (57.24 fps)
3659 frames decoded in 64.12 seconds (57.06 fps)
3659 frames decoded in 64.58 seconds (56.65 fps)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]