This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Input on new i386 atomics (was Re: Python 2.3 in testing)



Loren James Rittle wrote:
> 
> >> Btw, koenntest Du Dir mal den Vorschlag von Loren ansehen bzw. auf
> >> einer Mehrprozessormachine (falls Du sowas hast) testen?
> >> http://gcc.gnu.org/ml/libstdc++/2003-04/msg00407.html
> 
> > Hab' gerade (vor zwei Wochen) frisch eine bekommen :-)
> 
> > [I started answering in German, but now I see that it might be more
> > productive to send this in English to Loren as well]
> 
> Hi Martin, Agreed.  Although I have a German last name, I wouldn't
> know how to read German without a babelfish. ;-)
> 
> [...]
> > In the assembler code, I see none of this: no 1, but instead it looks
> > like the value of __lock gets swapped with itself. Apparently, the
> > declaration of input registers went wrong, it probably should be
> 
> >   "0" (__tmp)
> 
> > instead.
> 
> Agreed, I had only looked far enough to fix the compilation error but
> I see exactly what you are writing about (and I have now re-read the
> __asm__ section of the gcc manual for good measure).  How about this instead:
> 
> 48,51c48
> <     __asm__ __volatile__("xchgl %0,%1"
> <                        :"=r" (__tmp)
> <                        :"m" (__lock), "0" (__lock)
> <                        :"memory");
> ---
> >     __asm__ __volatile__ ("xchgl %0,%1" : "+m" (__lock), "+r" (__tmp));
> 
> (The asm loop for all context and options I tried now looks correct.
>  BTW, there are no undisclosed memory or condition code affects thus I
>  removed the needless (IMHO) "memory" statement.)

Sorry for the bug. It was mine.  Thanks for fixing it.

> > If that part is corrected, it seems to me that the answer to the
> > question "Why is this OK?" is straight forward: We are holding the
> > lock, so we can release it by just assigning to it.
> 
> >From looking at the idiom used by others in various i386-based
> kernels, I too now think that a simple write back is fine.  The reason
> I added that comment was to get eagle eyes on the problem. ;-)
> 
> > I'm not quite sure what problem this is going to solve, though, for
> > Debian: There is an ABI change where, on i386, one needs to hold
> > __lock to perform the atomic add. This poses two questions
> 
> At the moment, my version of Joel's patch just fixes a major speed
> regression introduced Nov 2002 when i386-*-* was (rightly, pending
> further research) switched to generic atomics as a bug fix.
> 
> FYI, thus we already have a major change installed on mainline/3.3
> which is not in 3.2 that breaks this ABI for i386 machines.  It was
> (rightly) called a bug fix at the time the change was installed since
> xaddl doesn't exist on plain i386 yet somehow snuck into a file with
> i386 support before gcc 3.0 was released.

The set of generic i386 users is probably quite small in the general
UNIX community.  The embedded i386 community is probably larger with
i386ex and other new cores seeing new development activities.  

FWIW gcc 3.x has been slow to be adopted by the embedded community.
If you ever watch the crossgcc mailing list, you will see a lot
of m68k users still on a patched gcc 2.95.3.  gcc 3.0 was horribly
broken for most embedded targets, gcc 3.1 didn't last long enough 
and wasn't stable enough for the RTEMS community to do more than
get a subset of targets going and some users switching.  gcc 3.2 is 
really the first branch where you are starting to see embedded users.
This is where you are going to start seeing problems on lower end
CPU models in families.

> >1. Where is __lock expected to live? Currently, it gets into all
> >   object files, so it would always be present. Due to the nature
> >   of symbol resolution, you might end up with multiple copies of
> >   __lock, though, under obscure circumstances.
> 
> I have addressed that issue (at least the same way we do it for all
> other CPUs) with a new version of the patch that I will include in the
> next e-mail.  At least according to the rationale used in the other
> uses of the idiom, C++ linkage must guarantee one copy in the final
> executable.  If this is untrue, we would have >5 copies of atomicity.h
> to now fix. ;-)
> 
> >   If it is supposed to live in libstdc++: How would binaries built
> >   against this version of the ABI run on installations that assume
> >   486+? where the variable is not present in libstdc++?
> 
> Sorry, but that issue was broken by other patches (e.g. the one that
> forced i386 but not higher level CPUs to use generic) and if an
> important concern to the OS packagers, we had better fix it before the
> gcc 3.3 release.  OTOH, why does anyone presume that a compiler triple
> i386-*-*- is completely compatible with i686-*-*? (rhetorical question ;-)

But important to answer. :)

In the embedded community, when we build a target CPU-XXX, we EXPECT
that
the generated code and multilib'ed libraries are appropriate to support
ALL of the CPU models within that family using OS XXX.  That is the
case for arm, m68k, powerpc, mips, sparc, sh, etc. and the i386 should
be
no different.

If the libstdc++-v3 is worried about someone using macros/inlines in
header files with one cpu model setting and linking against library
code compiled a different way, then the ONLY solution in the current
framework is to move the code to real functions.  This lets the
existing multilib machinery work and solve things.  If you end up with
everyone using an i386 version, so be it.  But at least a function
is more controllable and you can determine its contents at 
libstdc++-v3 compile time.

Ralf has mentioned that multilib'ed header files is also an option
but that isn't supported now AFAIK.

> >2. What happens if you mix i386 and i486+ ABI in a single program?  It
> >   appears to me that the following sequence of actions might be
> >   possible, if two threads simultaneously try to modify the same
> >   _Atomic_word, one with the i386 ABI, and one with the i486 ABI:
> >   a) i386 obtains __lock
> >   b) i386 reads value of __mem into __result
> >   c) i386 reads value of __mem for addition
> >   d) i486 performs atomic addition, giving __mem+486
> >   e) i386 performs addition, and writes back __mem+386
> >   f) i386 releases __lock
> >   g) both return __mem
> >   As a result, the variable is now __mem+386, whereas it should
> >   be __mem+386+486, so the write operation of i486 was lost.
> 
> Yup, I think that you are precisely correct regarding the situation on
> mainline and 3.3 branch both before and after Joel's patch (with your
> fix and my tweaks) to reenable an optimized i386 spin lock...
> 
> Sorry, but there will be an effective ABI change between 3.2.X and 3.3
> in libstdc++-v3.  We now need to figure out how to make
> sure that there will be none between 3.3/i386 and 3.3/i686.
> (also, rhetorical, I will have an orthogonal patch for that issue.)

Unless every CPU family ends up with the least common denominator 
implementation, the only solution I see is that this type of code
has to move to functions.  Otherwise, you have no control.

A bit of overhead perhaps but WAY more control over what version 
everyone is using.

> Regards,
> Loren

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp dot com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]