This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, alpha]: Fix libgomp.c/autopar-1.c execution test failure
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Richard Henderson <rth at redhat dot com>
- Date: Tue, 25 Aug 2009 23:08:52 +0200
- Subject: [PATCH, alpha]: Fix libgomp.c/autopar-1.c execution test failure
Hello!
This one was surprisingly hard and the fact that alpha thread debugging
is currently FUBAR on Ubuntu 2.6.30.5 was not helpful at all :(
At the end, the problem was found in LL/SC sequence that implements
sync_compare_and_swap functionality. Looking at the Alpha Archicecture
Handbook [1], Notes for Section 4.2.4, LL/SC link will always fail for
various things, such as FP insn, memory access or branch in between LL
and SC.
So, it happens that various optimization passes after split-after-reload
mangled LL/SC sequence into:
?> 120000c7c: 00 40 00 60 mb
120000c80: 0b 14 4b 59 addt $f10,$f11,$f11
120000c84: 02 0e 7f 71 ftoit $f11,t1
-> 120000c88: 00 00 70 ac ldq_l t2,0(a0)
120000c8c: a4 05 61 40 cmpeq t2,t0,t3
*> 120000c90: 06 00 80 f4 bne t3,120000cac <foo.$loopfn.0+0xfc>
120000c94: a1 05 61 40 cmpeq t2,t0,t0
120000c98: 02 04 e3 47 mov t2,t1
120000c9c: 08 00 20 e4 beq t0,120000cc0 <foo.$loopfn.0+0x110>
120000ca0: 00 00 5e a7 ldq ra,0(sp)
120000ca4: 20 00 de 23 lda sp,32(sp)
120000ca8: 01 80 fa 6b ret
120000cac: 04 04 e2 47 mov t1,t3
-> 120000cb0: 00 00 90 bc stq_c t3,0(a0)
120000cb4: f4 ff 9f e4 beq t3,120000c88 <foo.$loopfn.0+0xd8>
120000cb8: 00 40 00 60 mb
And yeah... the branch at 0x120000c94 to 0x120000cac in LL/SC sequence
broke the link, since it jumped to SC when compare-and-swap arguments
were equal.
Patched gcc creates following sequence:
? mb
addt $f10,$f11,$f11
ftoit $f11,$2
$L12:
> ldq_l $3,0($16)
cmpeq $3,$1,$4
beq $4,$L13
mov $2,$4
> stq_c $4,0($16)
beq $4,$L12
mb
So, there is no _TAKEN_ branch inside SC/LL sequence. This sequence
relases lock as expected (and is AFAICS the same as 64bit cmpxchg
sequence in linux-2.6/arch/alpha/include/asm/xchg.h, modulo wrongly
predicted backward branch at the end of our sequence.).
The actual gcc problem is in bbro pass that for some reason takes hot BB
out of the main sequence. The proposed solution is to limit compiler
creativity by delaying insn splitting after the bbro pass,
conditionalizing split on epilogue_completed instead of
reload_completed. There is no problem with follow-up scheduling pass,
since both LL and SC insns are unspec_volatiles, effectively a
scheduling barriers.
Attached patch uses the same approach to all other sync LL/SC sequences,
so generated sequences will more or less stay in the original expanded form.
Yes, the comment in sync.md was wrong. Taken branches clear lock, at
least on
cpu : Alpha
cpu model : EV68AL
cpu variation : 7
cpu revision : 0
BTW: Linux does not need memory barrier in front of the LL/SC sequence,
I will propose a follow-up patch that removes it.
2009-08-25 Uros Bizjak <ubizjak@gmail.com>
* config/alpha/sync.md: Update comment about unpredictable LL/SC lock
clearing by a taken branch.
(sync_<fetchop_name><mode>): Split when epilogue_completed is set,
effectively after bbro pass.
(sync_nand<mode>): Ditto.
(sync_old_<fetchop_name><mode>): Ditto.
(sync_old_nand<mode>): Ditto.
(sync_new_<fetchop_name><mode>): Dito.
(sync_new_nand<mode>): Ditto.
(sync_compare_and_swap<mode>_1): Ditto.
(*sync_compare_and_swap<mode>): Ditto.
(sync_lock_test_and_set<mode>_1): Ditto.
("sync_lock_test_and_set<mode>): Ditto.
The patch was tested on 4-processor alphaev68-unknown-linux-gnu. Patch
fixes:
WARNING: program timed out.
FAIL: libgomp.c/autopar-1.c execution test
in libgomp testsuite.
OK for mainline and after a week or two for 4.4 and 4.3?
[1]http://www.comms.scitech.susx.ac.uk/fft/programming/alphaahb.pdf
Uros.
Attachment:
a.diff.txt
Description: Text document