This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Performance of Integer Multiplication on PIII (Results forgcc-2.95 & Athlon)



> >
> > Hi,
> Hi,
> could you please try the patch I've attached if it solves the
> slowdown?  it should IMO.
Unfortunately, not. Look

Result for pent.s (AMD K7 Model 1 at 600 MHz)

hgcc: gcc-2.95.2 with haifa  scheduler
ngcc: gcc-2.95.2 with normal scheduler

hgcc -s -o imulp pent.s
ls -l imulp: 3368 Nov  5 17:29 imulp

2366519
3.250000

ngcc -s -o imulp pent.s
ls -l imulp: 3416 Nov  5 17:29 imulp

2366519
3.250000

& 3.25s > 3.03 or 2.57
gcc3.0.2 -O2 -march=athlon -s -o -o imul imul.c read_empty.c read.c
 Loop: 0.89, Code: 3.03
 Clocks: 22.57

gcc-2.95.2,3 (haifa):
 Loop: 1.01, Code: 2.57
 Clocks: 19.15

> How much of this speedup is accounted to fomit-frame-pointer and how
> much to the alignment changes?

For this (generalied afterwarts) question, i got:

Reference: (note: MHz value for clocks in imul.c not changed, for consitency)

hgcc -s -o imul imul.c read_empty.c read.c -fomit-frame-pointer
-malign-loops=2

ls -l imul : 3672 Nov  5 17:32 imul

 Loop: 2.47, Code: 1.34
 Clocks: 9.98

hgcc -s -o imul imul.c read_empty.c read2.c -fomit-frame-pointer
-malign-loops=2 -malign-functions=2 -malign-jumps=2 -march=i686 -O2

ls -l imul : 3688 Nov  5 17:34 imul

 Loop: 2.46, Code: 0.90
 Clocks: 6.70

Base: -O2 -march=i686
 Loop: 1.01, Code: 2.57
 Clocks: 19.15

Base & -fomit-frame-pointer:
 Loop: 0.78, Code: 2.92
 Clocks: 21.75

Base & -malign-loops=2:
 Loop: 1.01, Code: 2.91
 Clocks: 21.68

Base & -malign-functions=2:
 Loop: 1.01, Code: 2.57
 Clocks: 19.15

Base & -malign-jumps=2: (in words: same result as above)
 Loop: 1.01, Code: 2.57
 Clocks: 19.15

Base & -malign-functions=2 & -malign-jumps=2:
 Loop: 1.01, Code: 2.57
 Clocks: 19.15

-------

Reference:

hgcc -s -o imul imul.c read_empty.c read2.c -fomit-frame-pointer
-malign-loops=2 -malign-functions=2 -malign-jumps=2 -march=i686 -O2
 Loop: 2.46, Code: 0.90
 Clocks: 6.70

Best from above:

hgcc -s -o imul imul.c read_empty.c read2.c -march=i686 -O2
-malign-functions=2 -malign-jumps=2

 Loop: 1.01, Code: 2.35
 Clocks: 17.51

(... There is no Kingsway to math ...

Base: -O2 -march=i686
 Loop: 1.01, Code: 2.35
 Clocks: 17.51

Base  & -fomit-frame-pointer:
 Loop: 0.78, Code: 2.36
 Clocks: 17.58

Base & -malign-loops=2:
 Loop: 1.01, Code: 2.75
 Clocks: 20.49

Base & -malign-functions=2:
 Loop: 1.01, Code: 2.35
 Clocks: 17.51

Base & -malign-jumps=2:
 Loop: 1.01, Code: 2.35
 Clocks: 17.51

Base & -malign-functions=2 & -malign-jumps=2:
 Loop: 1.01, Code: 2.35
 Clocks: 17.51

Base & -malign-functions=2 & -malign-loops=2:
 Loop: 1.01, Code: 2.76
 Clocks: 20.56

Base & -malign-loops=2 & -malign-jumps=2
  <unstable>
  Loop: 1.00, Code: 2.80
  Clocks: 20.86
  Loop: 1.01, Code: 2.63
  Clocks: 19.59
  Loop: 1.01, Code: 2.65
  Clocks: 19.74
  Loop: 1.01, Code: 2.75
  Clocks: 20.49
  Loop: 1.01, Code: 2.57
  Clocks: 19.15
 .... finaly this may be adequat
 Loop: 1.00, Code: 2.75
 Clocks: 20.49

... hm, 4!= 24, but we know, that this time, no single option yields
peek performance ...

It seems, -fomit-frame-pointer is good here, so:

Base & -fomit-frame-pointer & -malign-functions=2:
 Loop: 0.78, Code: 2.36
 Clocks: 17.58

There is no ...

Base & -fomit-frame-pointer & -malign-functions=2 & -malign-jumps=2:
 Loop: 0.79, Code: 2.34
 Clocks: 17.43

There is no ... 12 out of 24 + 1  permutations so far, lets try this
"unstable" options above

Base & -fomit-frame-pointer -malign-loops=2 -malign-jumps=2
 Loop: 2.47, Code: 0.89
 Clocks: 6.63

Break even with icc reached ....

-------

no with gcc-3.0.2:

Reading specs from /opt/gcc3.0.2/lib/gcc-lib/i686-pc-linux-gnu/3.0.2/specs
Configured with: ../gcc-3.0.2/configure --prefix=/opt/gcc3.0.2
--exec-prefix=/op
Thread model: single
gcc version 3.0.2

Best from above
gcc -s -o imul imul.c read_empty.c read2.c -march=i686 -O2
-fomit-frame-pointer -malign-loops=2 malign-jumps=2
 Loop: 0.78, Code: 2.99
 Clocks: 22.27

(... There is no easy way ...)

Reference:
-ffplfj=2:= -fomit-frame-pointer -malign-loops=2 -malign-functions=2
            -malign-jumps=2

gcc3.0.2 -O2 -march=athlon -ffplfj=2
 Loop: 2.35, Code: 0.90
 Clocks: 6.70

(checked again, ok)

Base: -march=athlon -O2
 Loop: 0.90, Code: 2.57
 Clocks: 19.15

Base & -fomit-frame-pointer:
 Loop: 0.79, Code: 2.68
 Clocks: 19.97

Base & -malign-functions=2:
 Loop: 0.89, Code: 2.36
 Clocks: 17.58

Base & -malign-loops=2:
 Loop: 0.90, Code: 2.79
 Clocks: 20.78

Base & -malign-jumps=2:
 Loop: 0.89, Code: 2.58
 Clocks: 19.22

Base & -malign-functions=2 & -malign-jumps=2:
 Loop: 0.89, Code: 2.36
 Clocks: 17.58

Base & -malign-functions=2 & -malign-loops=2:
 Loop: 0.89, Code: 2.81
 Clocks: 20.93

Base & -malign-loops=2 & -malign-jumps=2:
 Loop: 0.90, Code: 2.79
 Clocks: 20.78

 ... This cost some patience ...

Anyway, align all on 4byte Boundary is far away from all, what AMS's
tuning guide told you (?)

But, now we top ourself:

Base & -fomit-frame-pointer -malign-loops=2 -malign-jumps=2
-mpreferred-stack-boundary=2
 Loop: 2.35, Code: 0.79
 Clocks: 5.89

Not bad, gcc-driven AMD-K7 (Athlon) Model 1 running at 600 Mhz is
   1.97/0.79= 2.5
times faster than icc driven P6-Modell III (Coppermine ?)
running at 500 Mhz. If gcc is handled with care!

And that's the good news, insn't it ;-)

Well, it seems, that the main problems are threefold:

  - Alignmend issues, which will also depend on the underlying glibc.

   {Hi Paolo (Carlini) listening: What was the quintessence of your flops
    example?}

  - We have no distict idea about what we should expect!

    I.e., how many time (proc-clocks) should this example need?
    For worstcase (no pipelining at all) & and best case (no
    stall at all) scenario.

    ... wasn't those numbers icc give as a comment not the
    the clocks needed?

  - gcc-3.0 x86 backend problems compared with gcc-2.95
   {at least, another example for that}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]