This is the mail archive of the java-discuss@sources.redhat.com mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Segment Register thread descriptor (was Re: Jv_AllocBytesChec ked)


"Boehm, Hans" <hans_boehm@hp.com> writes:

> 2) An inline function for retrieving such a thread id as fast as possible.

And where do you want to use this?

> 3) Some collection of atomic update primitives.  I could live with
> compare-and-swap as the only one.  There's a performance advantage to be
> gained from providing more, but it's much smaller than the difference
> between compare-and-swap and explicit locking.

This is already horribly complicated.  Since it would also have to
work on architectures (revisions) which don't have such primitives
built in it means you'll have to be able to emulate this.  But this
means you have to have special ways to define such atomically updated
objects.  In some case it'll not only be the object, you might have to
define a spinlock with it.  (Sometimes not even a spinlock is
sufficient.)

We have this problem in the thread library and it's ugly.  Exporting
this means putting this ugliness in stone.

As for the different functions, it's certainly possible to signal the
user which functions are natively available and which would have to be
emulated.  This would allow a user to write code perhaps in two
different ways, depending on what functionality is available.

>  	- There should be variants for at least int and long.

This is wrong.  All operations should be available only for intX_t and
uintX_t types (perhaps intmax_t and uintmax_t).  Additionally perhaps
the same for the int_fastX_t and uint_fastX_t types if this makes a
difference.

> 	- There should be variants that add different memory barrier
> properties.  The ones
> 	  I've found useful for compare-and-swap are:
> 		- none
> 		- acquire (later memory operations cannot move forward)
> 		- release (earlier memory operations cannot move backwards
> past the atomic op.)

The question is whether this is really all and whether it's useful to
make the differentiation at this high level at all.  There might be
new processor implementation which introduce even more variants.

> 	- They have a lock parameter, of a type declared in the header file.
> This parameter
> 	  is unused if the hardware provides the operation.  If the hardware
> fails to provide
> 	  the operation, the call attempts to acquire the lock for the
> duration of the operation.

This is very cumbersome.  The definitions should take care of this
themselves.  Again, look at what we have done in glibc.


> 4) Suitable macro definitions to allow locks for atomic operations to be
> introduced and used so
> that they disappear when they are not needed, e.g. a DECL_LOCK(name) macro
> to declare the lock,
> and a LOCK_ARG(name) macro to pass it if it's needed, or to pass zero
> otherwise.

Maybe this is what I meant above.  I am not sure, though.  In short,
one would need macros like

#define DEFVAR(type, var) \
   type var; \
   lock_type_t var##_lock

for an architecture without support for atomic operations and just

#define DEFVAR(type, var) \
   type var

otherwise.  The use macro can be

#define UPDATE(var, val) \
   lock (var##_lock);
   var = val;
   unlock (var##_lock)


> 5) A way to write to a memory location with release semantics, e.g. at the
> end of critical section entered with compare-and-swap.  (My impression is
> that on X86 this is just an assignment, but it requires some magic to tell
> the compiler not to move other operations past it.

This is actually showing quite dramatically how difficult it is to
have definitions which match everywhere.

There are generally two solutions:

- have instructions which explicitly provide the semantic (IA-64, ...)

- use memory barriers

Depending on what is available you write the code differently.  If the
critical region contains more than one store operation the question is
what is best to use.  If only memory barriers are available you put
exactly one write memory barrier before the end of the section.  If
you have release semantic in the instructions you'll use it for all
stores which can be the last ones in the critical region.  What do you
do for something like this

     BEGIN_CR

       a = ...some value...

       if (foo)
          b = ...another one...

     END_CR

Beside the case that one should probably care for architectures which
have no memory barriers but do have the special store instructions
there is the case where both types are available.  So you will have to
write the sources like this

     BEGIN_CR

       STORE_REL (a, ...some value...)

       if (foo)
          STORE_REL (b, ...another one...)

       WMB

     END_CR

But this will suck on architectures with both, st.rel and memory
barriers.  Which one to use depends on the actual processor
implementation.  So you'll have to introduce special STORE_REL and WMB
macro variants which are used if they are used together as in the
example above.

There are probably more such cases.


> An empty volatile asm seems to do the trick with gcc, but may be
> overkill.

No, an empty volatile has no consequences on the processor.  Memory
barriers and st.rel instructions ensure that the memory write/read
reordering taking place during execution in the processor.  The asm
volatiles only have effects on the compiler.  This is by far not
enough for modern architectures.  Even on x86 we have now explicit
memory barrier instructions.


Which brings me to the next point: using these definitions makes
applications either less portable (e.g., using the atomic operations
require new revisions of the hardware and therefore the programs
cannot be used on old hardware) or the programs are very slow (e.g.,
x86 applications would have to take i386 compatibility into account
which prevents almost all atomic operations the processor knows about
from being used).

If you use function calls for everything we would not have to start
thinking about this.  The consequence is that unless you are willing
to accept that binaries have to be compiled for the specific
architecture, you will restrict the platforms benefitting from these
new definitions to the very modern ones.  All architectures with
history (which includes even Alpha) have to use blended code which is
slow.


> 6) Everything needs to be in the system reserved part of the C namespace,
> since it should be usable from C, but also from standard C++ headers.

How this has to be implemented is a different story.  I wouldn't make
it a requirement that the same definitions are used.  C++, for
instance, allows using sentry objects like


  int
  foo (int a)
  {
    int b = bar (a);

    {
      sentry s;
      b += global;
    }

    return b + baz (a);
  }

Here sentry would be a special type responsible for creating a
critical region.  With the constructor and destructor being inlined
this can be fast.  In any case it is easy to use, even in the presence
of exception handling.



It is not that I have never giving these things any thought.  In fact,
I'm trying to find a solution for these problems for years.  But it is
hard to do it right and whenever you think that the current set is
sufficient somebody comes up with something when cannot be handled
efficiently or the processor guys come up with yet another fancy ways
to make themselves look good.


> If there's some consensus that this would be a step in the right direction,
> I'm willing to generate such a header file for the platforms I have easy
> access to.  But we would need to get at least some agreement that this also
> makes sense for C++ .  (I can also put in fetch and add explicitly, if that
> would help convince people.)

I think it is wrong to start right away with proposing interfaces.
First describe the problems and give examples (and make sure you are
covering only the limited range of problems associated with Java;
e.g., an OpenMP compliant FORTRAN implementation has some other
problems).  This will help people who know a lot about a specific
processor to decide how this can be best implemented given the
architectures features.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]