This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
double alignmwent/benchmark data
- To: egcs at cygnus dot com
- Subject: double alignmwent/benchmark data
- From: Marc Lehmann <pcg at goof dot com>
- Date: Mon, 6 Jul 1998 03:05:31 +0200
coming back to the "benchmark" program posted recently, I wanted to add some
more datapoints, showing how irregular fp performance currently is:
all tests were repeatedly done on an idle p-ii 333 system, using pgcc and
glibc-2.0.94+ (this version of pgcc is based on current cvs sources). the
granularity of the mflops value is ~+-2.
mflops flags
====== =============================================================
144.41 -O6 -mstack-align-double -malign-double -funroll-loops
142.58 -O6 -mno-stack-align-double -malign-double -funroll-all-loops
142.58 -O6 -mno-stack-align-double -malign-double -funroll-loops
142.58 -O6 -mstack-align-double -mno-align-double -funroll-loops
140.80 -O6 -mno-stack-align-double -funroll-loops
until now, there is not much difference between alignment options on and
off, it seems the alignment options don't really hurt and don't really work.
Now lets add fschedule-insns, since the x86 has "plenty" of fp-regs:
152.22 -O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-loops
154.30 -O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-all-loops
152.22 -O6 -fschedule-insns -mstack-align-double -mno-align-double -funroll-all-loops
woaw. integer code usually slows down considerably with -fschedule-insns... lets
remove -malign-double
77.15 -O6 -fschedule-insns -mno-stack-align-double -mno-align-double -funroll-loops
Interesting. -mno-align-double + -fschedule-insns halves performance.
154.30 -O6 -fschedule-insns -mstack-align-double -funroll-loops
76.63 -O6 -fschedule-insns -mno-stack-align-double -funroll-loops
likewise -mno-stack-align-double + -fschedule-insns.
So I'd say fp performance is quite random. Yet, it seems
-mstack-align-double is not very useful since -malign-double was
equally succesful in all cases. But this changes, just add a dummy int
argument to the function "trafo":
-void trafo(double *a,const int n,double *wksp,double *const ops)
+void trafo(double *a,const int n,double *wksp,double *const ops, int dummy)
- trafo(x,N,x+N,&ops);
+ trafo(x,N,x+N,&ops,0);
mflops flags
====== =============================================================
71.29 -O6 -fschedule-insns -mno-stack-align-double -mno-align-double -funroll-all-loops
71.29 -O6 -fschedule-insns -mno-stack-align-double -malign-double -funroll-all-loops
154.30 -O6 -fschedule-insns -mstack-align-double -mno-align-double -funroll-all-loops
154.30 -O6 -fschedule-insns -mstack-align-double -malign-double -funroll-all-loops
So, at least for _predictable_ fp performance, we need both switches.
Yet unsolved questions:
- integer code slows down with -mstack-align-double? (I may add some data to
this case)
- aligning argument slots/copying to spill slots - is it worth the effort?
(I'll have to look into some fp-intensive programs to see how real that
problem is. unfortunately, neither solution works in current gcc sources)
- how about doing the first scheduling pass only on fp-insns?
Hope this was useful.
PS: compiling glibc with -malign-double breaks current libstc++ (both 2.8.1
and the egcs version, due to misalignment of _offset in struct _IO_FILE)
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ --+
The choice of a GNU generation |
|