This is the mail archive of the
mailing list for the GCC project.
Re: Performance of Integer Multiplication on PIII
> On Mon, 5 Nov 2001, Jan Hubicka wrote:
> > The attached patch should fix all three problems. Your testcase still
> > does use some unwound multiplies, but runs faster on celeron machines here
> > in lab than the assembly one you supplied.
> Ok. Here are some more results including using your code.
> $ gcc-3.0.2 -O2 -march=i686 read.c read-empty.c t.c && a.out
> Loop: 1.33, Code: 4.72
> Clocks: 35.16
> $ gcc -O2 -march=i686 read.c read-empty.c t.c && a.out
> Loop: 1.32, Code: 3.59
> Clocks: 26.74
> $ gcc -O2 -march=i686 read.hand.s read-empty.c t.c && a.out
> Loop: 1.30, Code: 1.95
> Clocks: 14.53
> $ gcc -O2 -march=i686 read.new.s read-empty.c t.c && a.out
> Loop: 1.32, Code: 2.32
> Clocks: 17.28
> So, my code still does better on my machine, however the new assembly
> output is certainly acceptable. Especially since you say it outperforms my
> code on your machine. A few clock cycles won't make that much diffrence....
I guess it is because I used -fomit-frame-pointer in my tests.
You assembly code does not use ebp eighter so I guess it is fair.
That should make the few percent difference I hope.