This is the mail archive of the
mailing list for the GCC project.
Re: Performance of Integer Multiplication on PIII
- To: <pete at ltoi dot iap dot physik dot tu-darmstadt dot de>, <kevin at atkinson dot dhs dot org>
- Subject: Re: Performance of Integer Multiplication on PIII
- From: "Tim Prince" <tprince at computer dot org>
- Date: Sat, 3 Nov 2001 08:34:28 -0800
- Cc: <gcc at gcc dot gnu dot org>
- References: <Pine.A32.firstname.lastname@example.org>
----- Original Message -----
Sent: Saturday, November 03, 2001 7:36 AM
Subject: Re: Performance of Integer Multiplication on PIII
> A slight step off-topic.
> -Regarding architectural differences between the (now) 4 different P6
> incarnations P6 Model 1 to 4 (aka PPro, PII, PIII and PIII-Tualin):
> There are no significant differences between them, that justify
> -march sub options (at least in 32 bit mode) for gcc.
> Whithin a small margin, they perform all equal, if you "divide out"
> differences of the evironments (Chipsets, RAM & RAM-speed, CPU
> -In respect of the PIV (as public relations call it): This is just a
> different CPU. So much different, that i suggest, we are better of,
> to biase -march=i686 for it's various quirks. I.e.: For this CPU, one
> should indeed use a new arch sub option.
> (I'am not informed about to what extent this is done in gcc-3.1)
gcc-3.1 has -march=pentium4 for the P4. Of course, subsequent NetBurst
versions should reduce a few of the excessive operation costs.
> -On topic:
> (See below)
> Hope that helps,
> Peter Schorsch
> > When running these same tests on on Mobile Pentium MMX
> > Gcc code does out perform mine. I do not have anything in between to
> > these tests on so I would appreciate it if someone with a Pentium Pro
> > PII (or is that the same thing as a Pentium Pro?) could run them and
> > the results.
> Form Agner Fog (http://www.agner.org/assem/) pentopt.zip
> PPlain PMMX PPro PII PIII
> IMUL latency 9 9 4 4 4
> IMUL throughput 1/9 1/9 1/1 1/1 1/1
> That means, imul is pipelined on i686 ...
> > So I guess the lesson here is that on PIII integer multiplication is
> > enough that doing special tricks to avoid integer multiplication will
> > performs in stead of helping it.
Even on the P4, code which permits full pipelining will run well with
imul, while the add and shift sequences are preferable in contexts where
that is not possible. I haven't seen any compiler which is able to
distinguish those situations.