This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Performance of Integer Multiplication on PIII
- To: <pete at ltoi dot iap dot physik dot tu-darmstadt dot de>, <kevin at atkinson dot dhs dot org>
- Subject: Re: Performance of Integer Multiplication on PIII
- From: "Tim Prince" <tprince at computer dot org>
- Date: Sat, 3 Nov 2001 08:34:28 -0800
- Cc: <gcc at gcc dot gnu dot org>
- References: <Pine.A32.4.40.0111031642220.12759-100000@ltoi.iap.physik.tu-darmstadt.de>
----- Original Message -----
From: <pete@ltoi.iap.physik.tu-darmstadt.de>
To: <kevin@atkinson.dhs.org>
Cc: <gcc@gcc.gnu.org>
Sent: Saturday, November 03, 2001 7:36 AM
Subject: Re: Performance of Integer Multiplication on PIII
> Hi!
>
> A slight step off-topic.
>
> -Regarding architectural differences between the (now) 4 different P6
> incarnations P6 Model 1 to 4 (aka PPro, PII, PIII and PIII-Tualin):
> There are no significant differences between them, that justify
special
> -march sub options (at least in 32 bit mode) for gcc.
> Whithin a small margin, they perform all equal, if you "divide out"
the
> differences of the evironments (Chipsets, RAM & RAM-speed, CPU
internal
> speed,...)
>
> -In respect of the PIV (as public relations call it): This is just a
> different CPU. So much different, that i suggest, we are better of,
not
> to biase -march=i686 for it's various quirks. I.e.: For this CPU, one
> should indeed use a new arch sub option.
> (I'am not informed about to what extent this is done in gcc-3.1)
>
gcc-3.1 has -march=pentium4 for the P4. Of course, subsequent NetBurst
versions should reduce a few of the excessive operation costs.
>
> -On topic:
> (See below)
>
> Hope that helps,
>
> Peter Schorsch
>
> > When running these same tests on on Mobile Pentium MMX
(using -march=i586)
> > Gcc code does out perform mine. I do not have anything in between to
run
> > these tests on so I would appreciate it if someone with a Pentium Pro
and
> > PII (or is that the same thing as a Pentium Pro?) could run them and
post
> > the results.
>
> Form Agner Fog (http://www.agner.org/assem/) pentopt.zip
>
> PPlain PMMX PPro PII PIII
> IMUL latency 9 9 4 4 4
> IMUL throughput 1/9 1/9 1/1 1/1 1/1
>
> That means, imul is pipelined on i686 ...
>
> > So I guess the lesson here is that on PIII integer multiplication is
fast
> > enough that doing special tricks to avoid integer multiplication will
hurt
> > performs in stead of helping it.
>
Even on the P4, code which permits full pipelining will run well with
imul, while the add and shift sequences are preferable in contexts where
that is not possible. I haven't seen any compiler which is able to
distinguish those situations.