The SH specific atomic functions don't seem to work properly with the SH2A processor. An example function from the gcc/config/sh/linux-atomic.asm is as follows: #define ATOMIC_FETCH_AND_OP(OP,N,T) \ .global __sync_fetch_and_##OP##_##N; \ HIDDEN_FUNC(__sync_fetch_and_##OP##_##N); \ .align 2; \ __sync_fetch_and_##OP##_##N:; \ mova 1f, r0; \ mov r15, r1; \ mov #(0f-1f), r15; \ <<< SP modification 0: mov.##T @r4, r2; \ OP r2, r5; \ mov.##T r5, @r4; \ 1: mov r1, r15; \ <<< SP modification rts; \ mov r2, r0; \ ENDFUNC(__sync_fetch_and_##OP##_##N) Functions following this style cause the application to eventually crash. I think this may be related to the locking mechanism around the instructions making use of an on-board MMU, since the SP is loaded with a negative value. As the SH2A does not have an MMU this poses a problem. Replacing the SP modifying code with direct interrupt disabling/enabling resolves the problem and the application no longer crashes: #define ATOMIC_FETCH_AND_OP(OP,N,T,EXT) \ .global __sync_fetch_and_##OP##_##N; \ HIDDEN_FUNC(__sync_fetch_and_##OP##_##N); \ .align 2; \ __sync_fetch_and_##OP##_##N:; \ stc sr, r0; \ mov r0, r1; \ or #0xf0, r0; \ ldc r0, sr; \ <<< interrupt disable mov.##T @r4, r2; \ OP r2, r5; \ mov.##T r5, @r4; \ ldc r1, sr; \ <<< interrupt restore rts; \ EXT r2, r0; \ ENDFUNC(__sync_fetch_and_##OP##_##N)
(In reply to comment #0) > The SH specific atomic functions don't seem to work properly with the SH2A > processor. An example function from the gcc/config/sh/linux-atomic.asm is as > follows: > > [...] > > Functions following this style cause the application to eventually crash. I > think this may be related to the locking mechanism around the instructions > making use of an on-board MMU, since the SP is loaded with a negative value. > As the SH2A does not have an MMU this poses a problem. This is the default GNU/Linux gUSA ABI, which signals the 'in atomic sequence' condition with a negative SP in the range somewhere between -128...-1. The interrupt/exception handling code must check for this condition before transferring control to some other execution context (other thread or direct interrupt handler). Please notice that starting with GCC 4.7 the atomic sequence implementation has changed and now has to be enabled with the -msoft-atomics option. For a slightly more detailed description of the machinery please see the comments in gcc/config/sh/sync.md. > Replacing the SP modifying > code with direct interrupt disabling/enabling resolves the problem and the > application no longer crashes: > > [...] While flipping interrupts on/off might fix the problem it is not recommended to do so because of performance reasons (slower code on average, increased interrupt latency etc). It is better and easier to fix the interrupt/exception handling code instead.
As described in comment #1 this is not a compiler issue, but rather a system/OS interrupt/exception handling issue.
Uhm, I've been thinking .... In fact, the current atomic sequences (-msoft-atomic) are not compatible with anything but SH3* and SH4*. This is because everything < SH3 (including SH2A) pushes SR and PC on the stack (R15) when an exception occurs, and for that the stack pointer must always be valid. The crashes mentioned in the original description are caused by the invalid stack pointer. I'm sorry for not noticing this when replying/closing this PR prematurely. I don't know how linux/glibc have been handling atomic ops on SH2 or SH2A, but I've got an idea that would work in a bare-metal setup. Instead of signaling the 'is in atomic sequence' condition through R15, it can be signaled through a thread local variable. For example a compare_and_swap sequence might look like this: mov #(0f-1f),r1 ! sequence length is held in R1 mova 1f,r0 .align 2 mov.l r0,@(#x,gbr) ! enter atomic sequence by setting the ! exit point to the thread local variable. ! #x is user defined through -m option 0: mov.<bwl> @r1,%0 mov #0,r0 cmp/eq %0,%4 bf 1f mov.<bwl> %3,@%1 1: mov.l r0,@(#x,gbr) ! set thread local variable to nullptr, ! which exits the atomic sequence. The check in the exception handling code for the 'is in atomic sequence' would be: @(#x,gbr) != 0 && @(0,r15) < @(#x,gbr) The atomic sequence rewind step in the exception handling code would be: @(0,r15) = @(#x,gbr) + r1 My assumption here is that gbr points to the execution context housekeeping structure, which is also accessible from user space. I've briefly checked the code of glibc and this seems to be the case. Problems will occur when the gbr is modified by user code (inline asm etc). So user code should not modify the gbr, or at least do it in a controlled way. But I think this issue can be ignored for now. The execution context struct must be modified to hold the additional variable for atomic sequences. The thread library / kernel will need a modification (the kernel will need a modification on SH2(A) anyway), so that @(#x,gbr) is initialized to zero on thread creation. Kaz, could you please provide some feedback on this idea?
(In reply to comment #3) > I don't know how linux/glibc have been handling atomic ops on SH2 or SH2A, but > I've got an idea that would work in a bare-metal setup. There is no glibc port for SH2* and highly unlikely someone will add it. SH2* linux uses and will use totally different environment instead of glibc/nptl. Those CPUs have no user/privilege modes and folks tend to use disable/enable interrupts for the atomicity on them like as in #0. *-linux configuration of gcc assumes glibc/nptl everywhere. It means that only sh[34]*-linux configuration will work well. For this PR, the current atomic functions or builtins should be disabled for sh2*-linux configuration and add something that works on SH2*. You would have a free hand to the implementation in this case, since currently there is nothing correct for SH2*.
(In reply to comment #4) > There is no glibc port for SH2* and highly unlikely someone will > add it. SH2* linux uses and will use totally different environment > instead of glibc/nptl. Those CPUs have no user/privilege modes and > folks tend to use disable/enable interrupts for the atomicity on them > like as in #0. > *-linux configuration of gcc assumes glibc/nptl everywhere. It means > that only sh[34]*-linux configuration will work well. For this PR, > the current atomic functions or builtins should be disabled for > sh2*-linux configuration and add something that works on SH2*. > You would have a free hand to the implementation in this case, > since currently there is nothing correct for SH2*. Thanks for the feedback. I'd like to propose to do the following ... - Add new option to select atomics model -matomic= With the following supported models: soft-gusa (SH3*, SH4* only) soft-tcb,gbr-offset=xxxx (SH1*,SH2*,SH3*,SH4*) hard-llcs (SH4A only) I think this is easier to handle than multiple individual options. - Make current option -msoft-atomic an alias for -matomic=soft-gusa. - Remove option -mhard-atomic. It's new in 4.8 and thus we don't need an alias for that. - Issue compiler error on unsupported atomic model for a given target (e.g. -m2a -matomic=soft-gusa --> error) - Add TCB (thread control block) soft atomics variations. BTW, they would be compatible across all SH 1..4 types. E.g. running precompiled SH2 libraries/binaries on SH4 would work -- if anyone wanted to do so. Does this make sense?
(In reply to comment #5) > Does this make sense? Sounds reasonable.
Author: olegendo Date: Mon Oct 1 08:34:02 2012 New Revision: 191899 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191899 Log: PR target/50457 * config/sh/sh.opt (matomic-model): New option. (msoft-atomic): Mark as deprecated and alias to matomic-model=soft-gusa. (mhard-atomic): Delete. * config/sh/predicates.md (gbr_displacement): New predicate. * config/sh/sh-protos.h (sh_atomic_model): New struct. (selected_atomic_model): New declaration. (TARGET_ATOMIC_ANY, TARGET_ATOMIC_STRICT, TARGET_ATOMIC_SOFT_GUSA, TARGET_ATOMIC_HARD_LLCS, TARGET_ATOMIC_SOFT_TCB, TARGET_ATOMIC_SOFT_TCB_GBR_OFFSET_RTX, TARGET_ATOMIC_SOFT_IMASK): New macros. * config/sh/linux.h (SUBTARGET_OVERRIDE_OPTIONS): Adapt setting to default atomic model. * config/sh/sh.c (selected_atomic_model_): New global variable. (selected_atomic_model, parse_validate_atomic_model_option): New functions. (sh_option_override): Replace atomic selection checks with call to parse_validate_atomic_model_option. * config/sh/sh.h (TARGET_ANY_ATOMIC, UNSUPPORTED_ATOMIC_OPTIONS, UNSUPPORTED_HARD_ATOMIC_CPU): Delete. (DRIVER_SELF_SPECS): Remove atomic checks. config/sh/sync.md: Update documentation comments. (atomic_compare_and_swap<mode>, atomic_exchange<mode>, atomic_fetch_<fetchop_name><mode>, atomic_fetch_nand<mode>, atomic_<fetchop_name>_fetch<mode>, atomic_nand_fetch<mode>): Use TARGET_ATOMIC_ANY as condition. Add TARGET_ATOMIC_STRICT check for SH4A case. Handle new TARGET_ATOMIC_SOFT_TCB and TARGET_ATOMIC_SOFT_IMASK cases. (atomic_test_and_set): Handle new TARGET_ATOMIC_SOFT_TCB and TARGET_ATOMIC_SOFT_IMASK cases. (atomic_compare_and_swapsi_hard, atomic_exchangesi_hard, atomic_fetch_<fetchop_name>si_hard, atomic_fetch_nandsi_hard, atomic_<fetchop_name>_fetchsi_hard, atomic_nand_fetchsi_hard): Add TARGET_ATOMIC_STRICT check. (atomic_compare_and_swap<mode>_hard, atomic_exchange<mode>_hard, atomic_fetch_<fetchop_name><mode>_hard, atomic_fetch_nand<mode>_hard, atomic_<fetchop_name>_fetch<mode>_hard, atomic_nand_fetch<mode>_hard, atomic_test_and_set_hard): Use TARGET_ATOMIC_HARD_LLCS condition. (atomic_compare_and_swap<mode>_soft, atomic_exchange<mode>_soft, atomic_fetch_<fetchop_name><mode>_soft, atomic_fetch_nand<mode>_soft, atomic_<fetchop_name>_fetch<mode>_soft, atomic_nand_fetch<mode>_soft, atomic_test_and_set_soft): Append _gusa to the insn names and use TARGET_ATOMIC_SOFT_GUSA as condition. (atomic_compare_and_swap<mode>_soft_tcb, atomic_exchange<mode>_soft_tcb, atomic_fetch_<fetchop_name><mode>_soft_tcb, atomic_fetch_nand<mode>_soft_tcb, atomic_<fetchop_name>_fetch<mode>_soft_tcb, atomic_nand_fetch<mode>_soft_tcb, atomic_test_and_set_soft_tcb): New insns. (atomic_compare_and_swap<mode>_soft_imask, atomic_exchange<mode>_soft_imask, atomic_fetch_<fetchop_name><mode>_soft_imask, atomic_fetch_nand<mode>_soft_imask, atomic_<fetchop_name>_fetch<mode>_soft_imask, atomic_nand_fetch<mode>_soft_imask, atomic_test_and_set_soft_imask): New insns. * doc/invoke.texi (SH Options): Document new matomic-model option. Remove msoft-atomic and mhard-atomic options. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/linux.h trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.opt trunk/gcc/config/sh/sync.md trunk/gcc/doc/invoke.texi
Hm, we still have libgcc/config/sh/linux-atomic.S around, which provides gUSA only implementations of atomic functions. How about replacing this with a C implementation instead? The C functions would then simply invoke the __builtin atomic functions. Would that be OK to do?
(In reply to comment #8) > Would that be OK to do? Yes, that sounds good for maintenance.
(In reply to comment #9) > (In reply to comment #8) > > Would that be OK to do? > > Yes, that sounds good for maintenance. OK, I've already got something here, but I'm confused about one thing. In config/sh/linux-atomic.S the function names go like ... .global __sync_lock_test_and_set_##N; \ HIDDEN_FUNC(__sync_lock_test_and_set_##N); \ (two leading underscores) When no atomic model is selected the compiler will turn this int test (int* x, int y, int z) { return __sync_val_compare_and_swap (x, y, z); } into: .global _test .type _test, @function _test: mov.l .L2,r0 ! 11 movsi_ie/1 [length = 2] jmp @r0 nop ! 12 sibcall_valuei [length = 4] .L3: .align 2 .L2: .long ___sync_val_compare_and_swap_4 (three leading underscores) Looking at config/arm/linux-atomic.c, the C functions all start with two leading underscores, which will result in three leading underscores in the .asm file. How does this fit together in the SH case? The compiler generates refs to '___sync*' but in the .S file there are only '__sync' symbols. I'm afraid I'm missing something here...
Created attachment 28321 [details] cleanup libgcc/config/sh/linux-atomic This is what I've got so far as a cleanup of the libgcc/config/sh/linux-atomic* stuff. I had to add built-in defines for the atomic model and while at it, I've moved the TARGET_CPU_CPP_BUILTINS macro implementation from sh.h to sh-c.c, as it is done on some other targets. Kaz, could you please try out this patch? I'm not sure, but maybe it will require adding symbol aliases for __sync* -> ___sync* ...
(In reply to comment #11) > Created attachment 28321 [details] > cleanup libgcc/config/sh/linux-atomic Works fine for me. > (three leading underscores) You might get one extra underscore because embed-elf SH compiler uses one underscore as USER_LABEL_PREFIX. It's OK for linux which uses null USER_LABEL_PREFIX. I've got .long __sync_val_compare_and_swap_4 with sh4-unknown-linux-gnu gcc for that case.
(In reply to comment #12) > (In reply to comment #11) > > Created attachment 28321 [details] > > cleanup libgcc/config/sh/linux-atomic > > Works fine for me. > > > (three leading underscores) > > You might get one extra underscore because embed-elf SH compiler uses one > underscore as USER_LABEL_PREFIX. It's OK for linux which uses null > USER_LABEL_PREFIX. I've got > > .long __sync_val_compare_and_swap_4 > > with sh4-unknown-linux-gnu gcc for that case. Thanks for the explanation! Patch submitted: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00162.html
Author: olegendo Date: Wed Oct 3 21:36:14 2012 New Revision: 192051 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192051 Log: PR target/50457 * config/sh/sh.c (parse_validate_atomic_model_option): Handle name strings in sh_atomic_model. * config/sh/sh.h (TARGET_CPU_CPP_BUILTINS): Move macro implementation to ... * config/sh/sh-c.c (sh_cpu_cpp_builtins): ... this new function. Add __SH1__ and __SH2__ defines. Add __SH_ATOMIC_MODEL_*__ define. * config/sh/sh-protos.h (sh_atomic_model): Add name and cdef_name variables. (sh_cpu_cpp_builtins): Declare new function. PR target/50457 * config/sh/linux-atomic.S: Delete. * config/sh/linux-atomic.c: New. * config/sh/t-linux (LIB2ADD): Replace linux-atomic.S with linux-atomic.c. Add cflags to disable warnings. Added: trunk/libgcc/config/sh/linux-atomic.c Removed: trunk/libgcc/config/sh/linux-atomic.S Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-c.c trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/libgcc/ChangeLog trunk/libgcc/config/sh/t-linux
Hm, I was wondering whether the atomic model should also be checked in sh_check_pch_target_flags or not?
Another thing that might be useful for dealing with atomics on SH1* and SH2* targets is to make the compiler generate the rewind sequences in interrupt handler functions. On SH3* and SH4* targets, the interrupt handling jump table is usually designed in such a way, that it can contain (small) setup code before a user specified interrupt handling function is invoked. However, on SH1* and SH2* targets, the processor directly jumps to the user specified interrupt handling function via the interrupt jump table. It would be possible to have two such jump tables, where code in the first table handles atomic sequence rewinding and then jumps to the second (actual) interrupt handling function specified by the user. This is not necessary, if the compiler can insert the atomic sequence rewind code snippet (according to the currently selected atomic model) at the beginning of an interrupt handler function. Moreover, the compiler can omit the atomic rewind sequence if... * an ISR doesn't call any other (non-inlined) functions * an ISR doesn't contain any atomic sequences itself For example, a very simple timer ISR that just increments the high word of the system's free running timer doesn't need to handle any atomics. Another example would be an SCI/SCIF ISR that receives/sends data to/from a buffer. In this case, the ISR probably has two code execution paths. While the receive/send operation is still in progress the code path is probably very short and updates only a few internal variables, which doesn't need any atomic handling. When the receive/send operation has completed, the completion condition has to be propagated to user code and most likely requires calling other functions and thus handling atomic sequences. Ideally, the compiler would insert atomic handling code in the ISR only to those paths, which actually need it. This would reduce interrupt handling time. I'd propose to add a new function attribute 'handle_atomics' that can be used in combination with the 'interrupt_handler' function attribute for this. SH3* and SH4* targets could also benefit from this.
(In reply to Oleg Endo from comment #16) > Another thing that might be useful for dealing with atomics on SH1* and SH2* > targets is to make the compiler generate the rewind sequences in interrupt > handler functions. This is not specific to SH2A atomics. I've opened a new PR 67660 for this.