This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH: v2 of Joel's idea to reenabled non-generic on i386


[I have tried to CC everyone that is currently interested in this
 issue yet perhaps not on the libstdc++-v3 list.  Sorry if I missed
 someone.  There are now three related issues known to me: major MT
 performance regression on plain i386 on mainline (we knew about this
 issue without a good fix since Nov 2002) and 3.3 branch vs. 3.2.X;
 the effective ABI of libstdc++.so for i386 will not the same 3.2->3.3
 (only affects MT cases); the effective ABI of libstdc++.so for i386
 will not be compatible with >i486 in gcc 3.3 (only affects MT
 cases).]

This issue was discussed way back when, never studied by me again
after getting a functionality concession:
http://gcc.gnu.org/ml/libstdc++/2002-11/msg00023.html

Here is the plan, progressing in an orderly fashion.

1. Get a new baseline atomicity.h that works for plain i386 installed
on mainline ASAP (i.e. I'm committing this version to mainline after I
hear a ping from the named co-authors, since I've looked at the
generated code in various contexts/options and I'm happy it is correct).

Here is data to display the major run-time regression in context:

generic atomic, time to make check on i686 configured as i386:
  3621r  2924.7u   241.3s       gmake -sk check

i386 atomic (v2), time to make check on i686 configured as i386:
  2988r  2297.4u   238.7s       gmake -sk check

(The above data points include much compilation thus true speed up
 is not easy to infer.  The below data points include little compilation
 since most of the threaded tests run much longer than time to compile.)

generic atomic, time to make check 6 pthread cases on i686 configured as i386:
   759r   728.1u     4.5s       gmake -sk check
   781r   725.1u     4.5s       gmake -sk check
(Note that pthread[23] time out with an unknown amount of time left to run.)

i386 atomic (v2), time to make check 6 pthread cases on i686 configured as i386:
   344r   308.1u     3.4s       gmake -sk check
   358r   293.9u     3.7s       gmake -sk check
   335r   291.4u     3.3s       gmake -sk check
   344r   297.6u     3.0s       gmake -sk check

i486 atomic, time to make check 6 pthread cases on i686 configured as i686:
   322r   288.5u     3.4s       gmake -sk check
   351r   291.5u     3.3s       gmake -sk check
   647r   291.6u     3.6s       gmake -sk check

The way to read this data, on machine configured as i386, >10% speed
up with this patch for even non-threaded cases!  Heavily threaded
cases are over twice as fast.  The real gains are masked somewhat
since the times include all overhead to compile the test cases.

2. Test it on SMP (I now believe it works there and with higher CPUs
in the IA32 family based on feedback).  Tweak over next few days based
on testresults.

3. Reassess the i486 special case.  I wonder if breaking the effective
ABI between i386 and i486 is a good idea based on all the people
coming out of the wood work at the last minute on this issue. ;-)
Tweak over next few days.  Do something (at least allow a platform to
force e.g. i386 if they want this base-line ABI property).  Or,
consider making all changes required to hide atomics use inside
library (until we see a patch, this is risky, and I'm not committing
to create that patch before 3.3 is released).

4. Move 1+2+3 to the 3.3 branch with the justification based on the
major speed regression; and bug fix (ABI change in 3.3 already but we
can at least avoid the bug of i386 library not effective ABI
compatible with i686 user code, vice versa and etc).

2003-04-28 Joel Sherrill  <joel dot sherrill at OARcorp dot com>
	   Loren J. Rittle <ljrittle at acm dot org>
	   Martin v. Loewis  <martin at v dot loewis dot de>

	* config/cpu/i386/atomicity.h: New file.

Index: libstdc++-v3/config/cpu/i386/atomicity.h
===================================================================
RCS file: libstdc++-v3/config/cpu/i386/atomicity.h
diff -N libstdc++-v3/config/cpu/i386/atomicity.h
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- libstdc++-v3/config/cpu/i386/atomicity.h	29 Apr 2003 04:07:35 -0000
***************
*** 0 ****
--- 1,73 ----
+ // Low-level functions for atomic operations: x86, x >= 3 version  -*- C++ -*-
+ 
+ // Copyright (C) 2003 Free Software Foundation, Inc.
+ //
+ // This file is part of the GNU ISO C++ Library.  This library is free
+ // software; you can redistribute it and/or modify it under the
+ // terms of the GNU General Public License as published by the
+ // Free Software Foundation; either version 2, or (at your option)
+ // any later version.
+ 
+ // This library is distributed in the hope that it will be useful,
+ // but WITHOUT ANY WARRANTY; without even the implied warranty of
+ // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ // GNU General Public License for more details.
+ 
+ // You should have received a copy of the GNU General Public License along
+ // with this library; see the file COPYING.  If not, write to the Free
+ // Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307,
+ // USA.
+ 
+ // As a special exception, you may use this file as part of a free software
+ // library without restriction.  Specifically, if other files instantiate
+ // templates or use macros or inline functions from this file, or you compile
+ // this file and link it with other files to produce an executable, this
+ // file does not by itself cause the resulting executable to be covered by
+ // the GNU General Public License.  This exception does not however
+ // invalidate any other reasons why the executable file might be covered by
+ // the GNU General Public License.
+ 
+ #ifndef _BITS_ATOMICITY_H
+ #define _BITS_ATOMICITY_H	1
+ 
+ typedef int _Atomic_word;
+ 
+ template <int __inst>
+ struct __Atomicity_lock
+ {
+   static volatile _Atomic_word _S_atomicity_lock;
+ };
+ 
+ template <int __inst>
+ volatile _Atomic_word __Atomicity_lock<__inst>::_S_atomicity_lock = 0;
+ 
+ static inline _Atomic_word 
+ __attribute__ ((__unused__))
+ __exchange_and_add (volatile _Atomic_word *__mem, int __val)
+ {
+   register _Atomic_word __result, __tmp = 1;
+ 
+   /* obtain the atomic exchange/add spin lock */
+   do {
+     __asm__ __volatile__ ("xchgl %0,%1"
+ 			  : "+m" (__Atomicity_lock<0>::_S_atomicity_lock),
+ 			    "+r" (__tmp));
+   } while (__tmp);
+ 
+   __result = *__mem;
+   *__mem += __val;
+ 
+   /* release spin lock */
+   __Atomicity_lock<0>::_S_atomicity_lock = 0;
+ 
+   return __result;
+ }
+ 
+ static inline void
+ __attribute__ ((__unused__))
+ __atomic_add (volatile _Atomic_word* __mem, int __val)
+ {
+   __exchange_and_add (__mem, __val);
+ }
+ 
+ #endif /* atomicity.h */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]