This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[testresults][rfc] Re: Building for K6-2 CPU: i586 or i686?
- To: steve dot snyder at philips dot com, gcc at gcc dot gnu dot org
- Subject: [testresults][rfc] Re: Building for K6-2 CPU: i586 or i686?
- From: Dag B <dag at bredband dot no>
- Date: Wed, 20 Jun 2001 22:41:07 +0200
- References: <0056910012734000000002L102*@MHS>
I have done some testing with mpeg2dec* and gcc 2.95.3 to understand
which -mcpu/-march option gives the best performance on my AMD K6-2.
I present my findings here, for comment and abuse.
I choose mpeg2dec because it does limited io (*cough*, well, 1.6 MB/s here),
it is 'real' code, and because it produces some statistics at the end.
It uses both mmx and 3dnow. Also, in it's current version it will process
the input mpeg file as fast as possible. In addition, it can be made to use
a dummy output module, so your gfx hardware isn't a limiting factor.
The results are very clear, at least for this particular code:
if you can't build for -march=k6, you should go for -mcpu=i686,
and clearly not -mcpu=i586.
The sad thing is the many (most?) sourcepackages fail to detect a k6
through auto*-tools and end up optimizing for a i586...
I built 4 versions of mpeg2dec, with -mcpu=pentium, -mcpu=pentiumpro,
-mcpu=k6 and -march=k6. mpeg2dec links to libm, libc and libpthread
(among others), but I don't think it spends most time in those libraries
anyway. These libraries were not recompiled.
I am unsure how/if gcc scheduling is affected by the 3dnow portion of the
mpeg2dec code when optimizing for i586/i686, but i586 is a lot slower
than i686 in any case. I also note that there is no obvious gain in using
-mcpu=k6 over -mcpu=i686.
Is using 3dnow and optimizing for i686 kosher?
If it isn't, does that mean there is more optimization to gain from gcc?
Are there any obvious flaws to my testing?
(I do hope mmx/3dnow is used even with the nullslice output target...)
Is it reasonable to consider the results (i686 is better than i586 for a k6)
valid for 'common' sourcecode? (I.e. a general thing?)
Dag B
*) http://www.linuxvideo.org/mpeg2dec/
-mcpu=pentium:
> ./src/mpeg2dec.i586 -o nullslice /tmp/matrix.mpg
mpeg2dec-0.2.1-cvs (C) 2000-2001 Aaron Holtzman <aholtzma@ess.engr.uvic.ca>
Using MMX for IDCT transform
Using 3DNOW for motion compensation
3605 frames in 69.31 sec (52.01 fps), 75 last 0.52 sec (144.23 fps)
3659 frames decoded in 69.52 seconds (52.63 fps)
3659 frames decoded in 69.21 seconds (52.86 fps)
3659 frames decoded in 70.47 seconds (51.92 fps)
3659 frames decoded in 70.20 seconds (52.12 fps)
3659 frames decoded in 70.02 seconds (52.25 fps)
3659 frames decoded in 70.23 seconds (52.10 fps)
-mcpu=pentiumpro:
> ./src/mpeg2dec.i686 -o nullslice /tmp/matrix.mpg
3659 frames decoded in 65.69 seconds (55.70 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 64.14 seconds (57.04 fps)
3659 frames decoded in 65.82 seconds (55.59 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 65.38 seconds (55.96 fps)
-mcpu=k6:
> ./src/mpeg2dec.k6cpu -o nullslice /tmp/matrix.mpg
3659 frames decoded in 66.01 seconds (55.43 fps)
3659 frames decoded in 65.14 seconds (56.17 fps)
3659 frames decoded in 66.53 seconds (54.99 fps)
-march=k6:
> ./src/mpeg2dec.k6 -o nullslice /tmp/matrix.mpg
3659 frames decoded in 64.68 seconds (56.57 fps)
3659 frames decoded in 64.53 seconds (56.70 fps)
3659 frames decoded in 63.92 seconds (57.24 fps)
3659 frames decoded in 64.12 seconds (57.06 fps)
3659 frames decoded in 64.58 seconds (56.65 fps)