50457 – SH2A atomic functions

Bug 50457 - SH2A atomic functions

Summary: SH2A atomic functions

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.3.5

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-09-19 16:36 UTC by philip.stearns.andtr
Modified:	2019-06-14 19:21 UTC (History)
CC List:	1 user (show)

See Also:	67660
Host:
Target:	sh--*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-09-24 00:00:00

Attachments
cleanup libgcc/config/sh/linux-atomic (4.52 KB, patch) 2012-10-02 00:55 UTC, Oleg Endo	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description philip.stearns.andtr 2011-09-19 16:36:48 UTC

The SH specific atomic functions don't seem to work properly with the SH2A processor. An example function from the gcc/config/sh/linux-atomic.asm is as follows:

#define ATOMIC_FETCH_AND_OP(OP,N,T) \
	.global	__sync_fetch_and_##OP##_##N; \
	HIDDEN_FUNC(__sync_fetch_and_##OP##_##N); \
	.align	2; \
__sync_fetch_and_##OP##_##N:; \
	mova	1f, r0; \
	mov	r15, r1; \
	mov	#(0f-1f), r15; \ <<< SP modification
0:	mov.##T	@r4, r2; \
	OP	r2, r5; \
	mov.##T	r5, @r4; \
1:	mov	r1, r15; \ <<< SP modification
	rts; \
	 mov	r2, r0; \
	ENDFUNC(__sync_fetch_and_##OP##_##N)

Functions following this style cause the application to eventually crash. I think this may be related to the locking mechanism around the instructions making use of an on-board MMU, since the SP is loaded with a negative value. As the SH2A does not have an MMU this poses a problem. Replacing the SP modifying code with direct interrupt disabling/enabling resolves the problem and the application no longer crashes:

#define ATOMIC_FETCH_AND_OP(OP,N,T,EXT) \
	.global	__sync_fetch_and_##OP##_##N; \
	HIDDEN_FUNC(__sync_fetch_and_##OP##_##N); \
	.align	2; \
__sync_fetch_and_##OP##_##N:; \
	stc	sr, r0; \
	mov	r0, r1; \
	or	#0xf0, r0; \
	ldc	r0, sr; \ <<< interrupt disable
	mov.##T	@r4, r2; \
	OP	r2, r5; \
	mov.##T	r5, @r4; \
	ldc	r1, sr; \ <<< interrupt restore
	rts; \
	 EXT	r2, r0; \
	ENDFUNC(__sync_fetch_and_##OP##_##N)

Comment 1 Oleg Endo 2012-02-26 23:07:41 UTC

(In reply to comment #0)
> The SH specific atomic functions don't seem to work properly with the SH2A
> processor. An example function from the gcc/config/sh/linux-atomic.asm is as
> follows:
> 
> [...]
>  
> Functions following this style cause the application to eventually crash. I
> think this may be related to the locking mechanism around the instructions
> making use of an on-board MMU, since the SP is loaded with a negative value. 
> As the SH2A does not have an MMU this poses a problem. 

This is the default GNU/Linux gUSA ABI, which signals the 'in atomic sequence' condition with a negative SP in the range somewhere between -128...-1.  The interrupt/exception handling code must check for this condition before transferring control to some other execution context (other thread or direct interrupt handler).
Please notice that starting with GCC 4.7 the atomic sequence implementation has changed and now has to be enabled with the -msoft-atomics option.  For a slightly more detailed description of the machinery please see the comments in gcc/config/sh/sync.md.

> Replacing the SP modifying
> code with direct interrupt disabling/enabling resolves the problem and the
> application no longer crashes:
> 
> [...]

While flipping interrupts on/off might fix the problem it is not recommended to do so because of performance reasons (slower code on average, increased interrupt latency etc).  It is better and easier to fix the interrupt/exception handling code instead.

Comment 2 Oleg Endo 2012-02-26 23:25:51 UTC

As described in comment #1 this is not a compiler issue, but rather a system/OS interrupt/exception handling issue.

Comment 3 Oleg Endo 2012-09-24 14:19:41 UTC

Uhm, I've been thinking .... 

In fact, the current atomic sequences (-msoft-atomic) are not compatible with anything but SH3* and SH4*.  This is because everything < SH3 (including SH2A) pushes SR and PC on the stack (R15) when an exception occurs, and for that the stack pointer must always be valid.  The crashes mentioned in the original description are caused by the invalid stack pointer.  I'm sorry for not noticing this when replying/closing this PR prematurely.

I don't know how linux/glibc have been handling atomic ops on SH2 or SH2A, but I've got an idea that would work in a bare-metal setup.

Instead of signaling the 'is in atomic sequence' condition through R15, it can be signaled through a thread local variable.  For example a compare_and_swap sequence might look like this:

        mov          #(0f-1f),r1       ! sequence length is held in R1
        mova        1f,r0
	.align 2
        mov.l        r0,@(#x,gbr)    ! enter atomic sequence by setting the
                                                ! exit point to the thread local variable.
                                                ! #x is user defined through -m option
0:
        mov.<bwl> @r1,%0
        mov            #0,r0
        cmp/eq       %0,%4
        bf                1f
        mov.<bwl> %3,@%1
1:     mov.l           r0,@(#x,gbr)    ! set thread local variable to nullptr,
                                                   ! which exits the atomic sequence.

The check in the exception handling code for the 'is in atomic sequence'
would be:
  @(#x,gbr) != 0 && @(0,r15) < @(#x,gbr)

The atomic sequence rewind step in the exception handling code would be:
  @(0,r15) = @(#x,gbr) + r1

My assumption here is that gbr points to the execution context
housekeeping structure, which is also accessible from user space.  I've briefly
checked the code of glibc and this seems to be the case.

Problems will occur when the gbr is modified by user code (inline asm etc).
So user code should not modify the gbr, or at least do it in a controlled way.
But I think this issue can be ignored for now.

The execution context struct must be modified to hold the additional variable
for atomic sequences.

The thread library / kernel will need a modification (the kernel will need a modification
on SH2(A) anyway), so that @(#x,gbr) is initialized to zero on thread creation.

Kaz, could you please provide some feedback on this idea?

Comment 4 Kazumoto Kojima 2012-09-25 00:53:23 UTC

(In reply to comment #3) 
> I don't know how linux/glibc have been handling atomic ops on SH2 or SH2A, but
> I've got an idea that would work in a bare-metal setup.

There is no glibc port for SH2* and highly unlikely someone will
add it.  SH2* linux uses and will use totally different environment
instead of glibc/nptl.  Those CPUs have no user/privilege modes and
folks tend to use disable/enable interrupts for the atomicity on them
like as in #0.
*-linux configuration of gcc assumes glibc/nptl everywhere.  It means
that only sh[34]*-linux configuration will work well.  For this PR,
the current atomic functions or builtins should be disabled for
sh2*-linux configuration and add something that works on SH2*.
You would have a free hand to the implementation in this case,
since currently there is nothing correct for SH2*.

Comment 5 Oleg Endo 2012-09-25 10:53:36 UTC

(In reply to comment #4)
> There is no glibc port for SH2* and highly unlikely someone will
> add it.  SH2* linux uses and will use totally different environment
> instead of glibc/nptl.  Those CPUs have no user/privilege modes and
> folks tend to use disable/enable interrupts for the atomicity on them
> like as in #0.
> *-linux configuration of gcc assumes glibc/nptl everywhere.  It means
> that only sh[34]*-linux configuration will work well.  For this PR,
> the current atomic functions or builtins should be disabled for
> sh2*-linux configuration and add something that works on SH2*.
> You would have a free hand to the implementation in this case,
> since currently there is nothing correct for SH2*.

Thanks for the feedback.
I'd like to propose to do the following ...

- Add new option to select atomics model
  -matomic=
  
  With the following supported models:
    soft-gusa                 (SH3*, SH4* only)
    soft-tcb,gbr-offset=xxxx  (SH1*,SH2*,SH3*,SH4*)
    hard-llcs                 (SH4A only)
  
  I think this is easier to handle than multiple individual options.

- Make current option -msoft-atomic an alias for -matomic=soft-gusa.

- Remove option -mhard-atomic.  It's new in 4.8 and thus we don't need
  an alias for that.

- Issue compiler error on unsupported atomic model for a given target
  (e.g. -m2a -matomic=soft-gusa --> error)
  
- Add TCB (thread control block) soft atomics variations.
  BTW, they would be compatible across all SH 1..4 types.  E.g. running
  precompiled SH2 libraries/binaries on SH4 would work -- if anyone wanted
  to do so.

Does this make sense?

Comment 6 Kazumoto Kojima 2012-09-25 11:33:09 UTC

(In reply to comment #5)
> Does this make sense?

Sounds reasonable.

Comment 7 Oleg Endo 2012-10-01 08:34:11 UTC

Author: olegendo
Date: Mon Oct  1 08:34:02 2012
New Revision: 191899

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191899
Log:
	PR target/50457
	* config/sh/sh.opt (matomic-model): New option.
	(msoft-atomic): Mark as deprecated and alias to matomic-model=soft-gusa.
	(mhard-atomic): Delete.
	* config/sh/predicates.md (gbr_displacement): New predicate.
	* config/sh/sh-protos.h (sh_atomic_model): New struct.
	(selected_atomic_model): New declaration.
	(TARGET_ATOMIC_ANY, TARGET_ATOMIC_STRICT, TARGET_ATOMIC_SOFT_GUSA,
	TARGET_ATOMIC_HARD_LLCS, TARGET_ATOMIC_SOFT_TCB,
	TARGET_ATOMIC_SOFT_TCB_GBR_OFFSET_RTX, TARGET_ATOMIC_SOFT_IMASK):
	New macros.
	* config/sh/linux.h (SUBTARGET_OVERRIDE_OPTIONS): Adapt setting to
	default atomic model.
	* config/sh/sh.c (selected_atomic_model_): New global variable.
	(selected_atomic_model, parse_validate_atomic_model_option): New
	functions.
	(sh_option_override): Replace atomic selection checks with call to
	parse_validate_atomic_model_option.
	* config/sh/sh.h (TARGET_ANY_ATOMIC, UNSUPPORTED_ATOMIC_OPTIONS,
	UNSUPPORTED_HARD_ATOMIC_CPU): Delete.
	(DRIVER_SELF_SPECS): Remove atomic checks.
	config/sh/sync.md: Update documentation comments.
	(atomic_compare_and_swap<mode>, atomic_exchange<mode>,
	atomic_fetch_<fetchop_name><mode>, atomic_fetch_nand<mode>,
	atomic_<fetchop_name>_fetch<mode>, atomic_nand_fetch<mode>): Use
	TARGET_ATOMIC_ANY as condition.  Add TARGET_ATOMIC_STRICT check for
	SH4A case.  Handle new TARGET_ATOMIC_SOFT_TCB and
	TARGET_ATOMIC_SOFT_IMASK cases.
	(atomic_test_and_set): Handle new TARGET_ATOMIC_SOFT_TCB and
	TARGET_ATOMIC_SOFT_IMASK cases.
	(atomic_compare_and_swapsi_hard, atomic_exchangesi_hard,
	atomic_fetch_<fetchop_name>si_hard, atomic_fetch_nandsi_hard,
	atomic_<fetchop_name>_fetchsi_hard, atomic_nand_fetchsi_hard):
	Add TARGET_ATOMIC_STRICT check.
	(atomic_compare_and_swap<mode>_hard, atomic_exchange<mode>_hard,
	atomic_fetch_<fetchop_name><mode>_hard, atomic_fetch_nand<mode>_hard,
	atomic_<fetchop_name>_fetch<mode>_hard, atomic_nand_fetch<mode>_hard,
	atomic_test_and_set_hard): Use TARGET_ATOMIC_HARD_LLCS condition.
	(atomic_compare_and_swap<mode>_soft, atomic_exchange<mode>_soft,
	atomic_fetch_<fetchop_name><mode>_soft, atomic_fetch_nand<mode>_soft,
	atomic_<fetchop_name>_fetch<mode>_soft, atomic_nand_fetch<mode>_soft,
	atomic_test_and_set_soft): Append _gusa to the insn names and use
	TARGET_ATOMIC_SOFT_GUSA as condition.
	(atomic_compare_and_swap<mode>_soft_tcb, atomic_exchange<mode>_soft_tcb,
	atomic_fetch_<fetchop_name><mode>_soft_tcb,
	atomic_fetch_nand<mode>_soft_tcb,
	atomic_<fetchop_name>_fetch<mode>_soft_tcb,
	atomic_nand_fetch<mode>_soft_tcb, atomic_test_and_set_soft_tcb):
	New insns.
	(atomic_compare_and_swap<mode>_soft_imask,
	atomic_exchange<mode>_soft_imask,
	atomic_fetch_<fetchop_name><mode>_soft_imask,
	atomic_fetch_nand<mode>_soft_imask,
	atomic_<fetchop_name>_fetch<mode>_soft_imask,
	atomic_nand_fetch<mode>_soft_imask, atomic_test_and_set_soft_imask):
	New insns.
	* doc/invoke.texi (SH Options): Document new matomic-model option.
	Remove msoft-atomic and mhard-atomic options.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/linux.h
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh-protos.h
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.h
    trunk/gcc/config/sh/sh.opt
    trunk/gcc/config/sh/sync.md
    trunk/gcc/doc/invoke.texi

Comment 8 Oleg Endo 2012-10-01 13:49:44 UTC

Hm, we still have libgcc/config/sh/linux-atomic.S around, which provides gUSA only implementations of atomic functions.  How about replacing this with a C implementation instead?  The C functions would then simply invoke the __builtin atomic functions.  Would that be OK to do?

Comment 9 Kazumoto Kojima 2012-10-01 23:29:59 UTC

(In reply to comment #8)
> Would that be OK to do?

Yes, that sounds good for maintenance.

Comment 10 Oleg Endo 2012-10-01 23:39:12 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > Would that be OK to do?
> 
> Yes, that sounds good for maintenance.

OK, I've already got something here, but I'm confused about one thing.
In config/sh/linux-atomic.S the function names go like ...

	.global	__sync_lock_test_and_set_##N; \
	HIDDEN_FUNC(__sync_lock_test_and_set_##N); \

(two leading underscores)


When no atomic model is selected the compiler will turn this

int test (int* x, int y, int z)
{
  return __sync_val_compare_and_swap (x, y, z);
}

into:
	.global	_test
	.type	_test, @function
_test:
	mov.l	.L2,r0	! 11	movsi_ie/1	[length = 2]
	jmp	@r0
	nop	! 12	sibcall_valuei	[length = 4]
.L3:
	.align 2
.L2:
	.long	___sync_val_compare_and_swap_4

(three leading underscores)

Looking at config/arm/linux-atomic.c, the C functions all start with two leading underscores, which will result in three leading underscores in the .asm file.

How does this fit together in the SH case?  The compiler generates refs to '___sync*' but in the .S file there are only '__sync' symbols.  I'm afraid I'm missing something here...

Comment 11 Oleg Endo 2012-10-02 00:55:42 UTC

Created attachment 28321 [details]
cleanup libgcc/config/sh/linux-atomic

This is what I've got so far as a cleanup of the libgcc/config/sh/linux-atomic* stuff.  I had to add built-in defines for the atomic model and while at it, I've moved the TARGET_CPU_CPP_BUILTINS macro implementation from sh.h to sh-c.c, as it is done on some other targets.

Kaz, could you please try out this patch?

I'm not sure, but maybe it will require adding symbol aliases for __sync* -> ___sync* ...

Comment 12 Kazumoto Kojima 2012-10-02 02:17:51 UTC

(In reply to comment #11)
> Created attachment 28321 [details]
> cleanup libgcc/config/sh/linux-atomic

Works fine for me.

> (three leading underscores)

You might get one extra underscore because embed-elf SH compiler uses one
underscore as USER_LABEL_PREFIX.  It's OK for linux which uses null
USER_LABEL_PREFIX.  I've got

	.long	__sync_val_compare_and_swap_4

with sh4-unknown-linux-gnu gcc for that case.

Comment 13 Oleg Endo 2012-10-02 10:57:59 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > Created attachment 28321 [details]
> > cleanup libgcc/config/sh/linux-atomic
> 
> Works fine for me.
> 
> > (three leading underscores)
> 
> You might get one extra underscore because embed-elf SH compiler uses one
> underscore as USER_LABEL_PREFIX.  It's OK for linux which uses null
> USER_LABEL_PREFIX.  I've got
> 
>     .long    __sync_val_compare_and_swap_4
> 
> with sh4-unknown-linux-gnu gcc for that case.

Thanks for the explanation!
Patch submitted: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00162.html

Comment 14 Oleg Endo 2012-10-03 21:36:18 UTC

Author: olegendo
Date: Wed Oct  3 21:36:14 2012
New Revision: 192051

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192051
Log:
	PR target/50457
	* config/sh/sh.c (parse_validate_atomic_model_option): Handle name
	strings in sh_atomic_model.
	* config/sh/sh.h (TARGET_CPU_CPP_BUILTINS): Move macro implementation
	to ...
	* config/sh/sh-c.c (sh_cpu_cpp_builtins): ... this new function.
	Add __SH1__ and __SH2__ defines.  Add __SH_ATOMIC_MODEL_*__ define.
	* config/sh/sh-protos.h (sh_atomic_model): Add name and cdef_name
	variables.
	(sh_cpu_cpp_builtins): Declare new function.

	PR target/50457
	* config/sh/linux-atomic.S: Delete.
	* config/sh/linux-atomic.c: New.
	* config/sh/t-linux (LIB2ADD): Replace linux-atomic.S with
	linux-atomic.c.  Add cflags to disable warnings.


Added:
    trunk/libgcc/config/sh/linux-atomic.c
Removed:
    trunk/libgcc/config/sh/linux-atomic.S
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh-c.c
    trunk/gcc/config/sh/sh-protos.h
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.h
    trunk/libgcc/ChangeLog
    trunk/libgcc/config/sh/t-linux

Comment 15 Oleg Endo 2012-10-08 00:21:09 UTC

Hm, I was wondering whether the atomic model should also be checked in sh_check_pch_target_flags or not?

Comment 16 Oleg Endo 2012-11-12 20:57:12 UTC

Another thing that might be useful for dealing with atomics on SH1* and SH2* targets is to make the compiler generate the rewind sequences in interrupt handler functions.

On SH3* and SH4* targets, the interrupt handling jump table is usually designed in such a way, that it can contain (small) setup code before a user specified interrupt handling function is invoked.
However, on SH1* and SH2* targets, the processor directly jumps to the user specified interrupt handling function via the interrupt jump table.  It would be possible to have two such jump tables, where code in the first table handles atomic sequence rewinding and then jumps to the second (actual) interrupt handling function specified by the user.  This is not necessary, if the compiler can insert the atomic sequence rewind code snippet (according to the currently selected atomic model) at the beginning of an interrupt handler function.  Moreover, the compiler can omit the atomic rewind sequence if...
* an ISR doesn't call any other (non-inlined) functions
* an ISR doesn't contain any atomic sequences itself

For example, a very simple timer ISR that just increments the high word of the system's free running timer doesn't need to handle any atomics.

Another example would be an SCI/SCIF ISR that receives/sends data to/from a buffer.  In this case, the ISR probably has two code execution paths.  While the receive/send operation is still in progress the code path is probably very short and updates only a few internal variables, which doesn't need any atomic handling.  When the receive/send operation has completed, the completion condition has to be propagated to user code and most likely requires calling other functions and thus handling atomic sequences.  Ideally, the compiler would insert atomic handling code in the ISR only to those paths, which actually need it.  This would reduce interrupt handling time.

I'd propose to add a new function attribute 'handle_atomics' that can be used in combination with the 'interrupt_handler' function attribute for this.
SH3* and SH4* targets could also benefit from this.

Comment 17 Oleg Endo 2015-09-21 01:54:11 UTC

(In reply to Oleg Endo from comment #16)
> Another thing that might be useful for dealing with atomics on SH1* and SH2*
> targets is to make the compiler generate the rewind sequences in interrupt
> handler functions.

This is not specific to SH2A atomics.  I've opened a new PR 67660 for this.