This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
PATCH: v2 of Joel's idea to reenabled non-generic on i386
- From: Loren James Rittle <rittle at latour dot rsch dot comm dot mot dot com>
- To: libstdc++ at gcc dot gnu dot org
- Cc: martin at v dot loewis dot de, joel dot sherrill at OARcorp dot com, doko at cs dot tu-berlin dot de, angelo_f at bigpond dot com, corsepiu at faw dot uni-ulm dot de, aj at suse dot de
- Date: Tue, 29 Apr 2003 00:56:20 -0500 (CDT)
- Subject: PATCH: v2 of Joel's idea to reenabled non-generic on i386
- Reply-to: rittle at labs dot mot dot com
[I have tried to CC everyone that is currently interested in this
issue yet perhaps not on the libstdc++-v3 list. Sorry if I missed
someone. There are now three related issues known to me: major MT
performance regression on plain i386 on mainline (we knew about this
issue without a good fix since Nov 2002) and 3.3 branch vs. 3.2.X;
the effective ABI of libstdc++.so for i386 will not the same 3.2->3.3
(only affects MT cases); the effective ABI of libstdc++.so for i386
will not be compatible with >i486 in gcc 3.3 (only affects MT
cases).]
This issue was discussed way back when, never studied by me again
after getting a functionality concession:
http://gcc.gnu.org/ml/libstdc++/2002-11/msg00023.html
Here is the plan, progressing in an orderly fashion.
1. Get a new baseline atomicity.h that works for plain i386 installed
on mainline ASAP (i.e. I'm committing this version to mainline after I
hear a ping from the named co-authors, since I've looked at the
generated code in various contexts/options and I'm happy it is correct).
Here is data to display the major run-time regression in context:
generic atomic, time to make check on i686 configured as i386:
3621r 2924.7u 241.3s gmake -sk check
i386 atomic (v2), time to make check on i686 configured as i386:
2988r 2297.4u 238.7s gmake -sk check
(The above data points include much compilation thus true speed up
is not easy to infer. The below data points include little compilation
since most of the threaded tests run much longer than time to compile.)
generic atomic, time to make check 6 pthread cases on i686 configured as i386:
759r 728.1u 4.5s gmake -sk check
781r 725.1u 4.5s gmake -sk check
(Note that pthread[23] time out with an unknown amount of time left to run.)
i386 atomic (v2), time to make check 6 pthread cases on i686 configured as i386:
344r 308.1u 3.4s gmake -sk check
358r 293.9u 3.7s gmake -sk check
335r 291.4u 3.3s gmake -sk check
344r 297.6u 3.0s gmake -sk check
i486 atomic, time to make check 6 pthread cases on i686 configured as i686:
322r 288.5u 3.4s gmake -sk check
351r 291.5u 3.3s gmake -sk check
647r 291.6u 3.6s gmake -sk check
The way to read this data, on machine configured as i386, >10% speed
up with this patch for even non-threaded cases! Heavily threaded
cases are over twice as fast. The real gains are masked somewhat
since the times include all overhead to compile the test cases.
2. Test it on SMP (I now believe it works there and with higher CPUs
in the IA32 family based on feedback). Tweak over next few days based
on testresults.
3. Reassess the i486 special case. I wonder if breaking the effective
ABI between i386 and i486 is a good idea based on all the people
coming out of the wood work at the last minute on this issue. ;-)
Tweak over next few days. Do something (at least allow a platform to
force e.g. i386 if they want this base-line ABI property). Or,
consider making all changes required to hide atomics use inside
library (until we see a patch, this is risky, and I'm not committing
to create that patch before 3.3 is released).
4. Move 1+2+3 to the 3.3 branch with the justification based on the
major speed regression; and bug fix (ABI change in 3.3 already but we
can at least avoid the bug of i386 library not effective ABI
compatible with i686 user code, vice versa and etc).
2003-04-28 Joel Sherrill <joel dot sherrill at OARcorp dot com>
Loren J. Rittle <ljrittle at acm dot org>
Martin v. Loewis <martin at v dot loewis dot de>
* config/cpu/i386/atomicity.h: New file.
Index: libstdc++-v3/config/cpu/i386/atomicity.h
===================================================================
RCS file: libstdc++-v3/config/cpu/i386/atomicity.h
diff -N libstdc++-v3/config/cpu/i386/atomicity.h
*** /dev/null 1 Jan 1970 00:00:00 -0000
--- libstdc++-v3/config/cpu/i386/atomicity.h 29 Apr 2003 04:07:35 -0000
***************
*** 0 ****
--- 1,73 ----
+ // Low-level functions for atomic operations: x86, x >= 3 version -*- C++ -*-
+
+ // Copyright (C) 2003 Free Software Foundation, Inc.
+ //
+ // This file is part of the GNU ISO C++ Library. This library is free
+ // software; you can redistribute it and/or modify it under the
+ // terms of the GNU General Public License as published by the
+ // Free Software Foundation; either version 2, or (at your option)
+ // any later version.
+
+ // This library is distributed in the hope that it will be useful,
+ // but WITHOUT ANY WARRANTY; without even the implied warranty of
+ // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ // GNU General Public License for more details.
+
+ // You should have received a copy of the GNU General Public License along
+ // with this library; see the file COPYING. If not, write to the Free
+ // Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307,
+ // USA.
+
+ // As a special exception, you may use this file as part of a free software
+ // library without restriction. Specifically, if other files instantiate
+ // templates or use macros or inline functions from this file, or you compile
+ // this file and link it with other files to produce an executable, this
+ // file does not by itself cause the resulting executable to be covered by
+ // the GNU General Public License. This exception does not however
+ // invalidate any other reasons why the executable file might be covered by
+ // the GNU General Public License.
+
+ #ifndef _BITS_ATOMICITY_H
+ #define _BITS_ATOMICITY_H 1
+
+ typedef int _Atomic_word;
+
+ template <int __inst>
+ struct __Atomicity_lock
+ {
+ static volatile _Atomic_word _S_atomicity_lock;
+ };
+
+ template <int __inst>
+ volatile _Atomic_word __Atomicity_lock<__inst>::_S_atomicity_lock = 0;
+
+ static inline _Atomic_word
+ __attribute__ ((__unused__))
+ __exchange_and_add (volatile _Atomic_word *__mem, int __val)
+ {
+ register _Atomic_word __result, __tmp = 1;
+
+ /* obtain the atomic exchange/add spin lock */
+ do {
+ __asm__ __volatile__ ("xchgl %0,%1"
+ : "+m" (__Atomicity_lock<0>::_S_atomicity_lock),
+ "+r" (__tmp));
+ } while (__tmp);
+
+ __result = *__mem;
+ *__mem += __val;
+
+ /* release spin lock */
+ __Atomicity_lock<0>::_S_atomicity_lock = 0;
+
+ return __result;
+ }
+
+ static inline void
+ __attribute__ ((__unused__))
+ __atomic_add (volatile _Atomic_word* __mem, int __val)
+ {
+ __exchange_and_add (__mem, __val);
+ }
+
+ #endif /* atomicity.h */