This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

double alignmwent/benchmark data



coming back to the "benchmark" program posted recently, I wanted to add some
more datapoints, showing how irregular fp performance currently is:

all tests were repeatedly done on an idle p-ii 333 system, using pgcc and
glibc-2.0.94+ (this version of pgcc is based on current cvs sources). the
granularity of the mflops value is ~+-2.

mflops	flags
======	=============================================================
144.41	-O6 -mstack-align-double -malign-double -funroll-loops
142.58	-O6 -mno-stack-align-double -malign-double -funroll-all-loops
142.58	-O6 -mno-stack-align-double -malign-double -funroll-loops
142.58	-O6 -mstack-align-double -mno-align-double -funroll-loops
140.80	-O6 -mno-stack-align-double -funroll-loops

until now, there is not much difference between alignment options on and
off, it seems the alignment options don't really hurt and don't really work.

Now lets add fschedule-insns, since the x86 has "plenty" of fp-regs:

152.22	-O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-loops
154.30	-O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-all-loops
152.22	-O6 -fschedule-insns -mstack-align-double -mno-align-double -funroll-all-loops

woaw. integer code usually slows down considerably with -fschedule-insns... lets
remove -malign-double

77.15	-O6 -fschedule-insns -mno-stack-align-double -mno-align-double -funroll-loops

Interesting. -mno-align-double + -fschedule-insns halves performance.

154.30	-O6 -fschedule-insns -mstack-align-double -funroll-loops
76.63	-O6 -fschedule-insns -mno-stack-align-double -funroll-loops

likewise -mno-stack-align-double + -fschedule-insns.

So I'd say fp performance is quite random. Yet, it seems
-mstack-align-double is not very useful since -malign-double was
equally succesful in all cases. But this changes, just add a dummy int
argument to the function "trafo":

-void trafo(double *a,const int n,double *wksp,double *const ops)
+void trafo(double *a,const int n,double *wksp,double *const ops, int dummy)

-     trafo(x,N,x+N,&ops);
+     trafo(x,N,x+N,&ops,0);

mflops	flags
======	=============================================================
71.29	-O6 -fschedule-insns -mno-stack-align-double -mno-align-double -funroll-all-loops
71.29	-O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-all-loops
154.30	-O6 -fschedule-insns -mstack-align-double -mno-align-double -funroll-all-loops
154.30	-O6 -fschedule-insns -mstack-align-double -malign-double -funroll-all-loops

So, at least for _predictable_ fp performance, we need both switches.

Yet unsolved questions:

- integer code slows down with -mstack-align-double? (I may add some data to
  this case)
- aligning argument slots/copying to spill slots - is it worth the effort?
  (I'll have to look into some fp-intensive programs to see how real that
  problem is. unfortunately, neither solution works in current gcc sources)
- how about doing the first scheduling pass only on fp-insns?

Hope this was useful.

PS: compiling glibc with -malign-double breaks current libstc++ (both 2.8.1
    and the egcs version, due to misalignment of _offset in struct _IO_FILE)

      -----==-                                              |
      ----==-- _                                            |
      ---==---(_)__  __ ____  __       Marc Lehmann       +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com       |e|
      -=====/_/_//_/\_,_/ /_/\_\                          --+
    The choice of a GNU generation                        |
                                                          |


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]