According to: http://gcc.gnu.org/ml/gcc/2011-01/msg00314.html http://gcc.gnu.org/ml/gcc/2011-01/msg00279.html The tail call optimization should not be possible on _ITM_commitTransaction because this function can go (longjmp) to _ITM_beginTransaction and thus the stack will be corrupted. Here ASM: _function: pushq %r12 pushq %rbp pushq %rbx subq $16, %rsp ... call _ITM_beginTransaction .L8: ... call _ITM_WU8 addq $16, %rsp popq %rbx popq %rbp popq %r12 jmp _ITM_commitTransaction In this example, if _ITM_commitTransaction rolls back to L8, the function epilogue will restore wrong values to registers (modified by the function _ITM_commitTransaction) when leaving the _function. Moreover I think that the function _ITM_beginTransaction should have the flag ECS_RETURNS_TWICE because it is how it behaves (ie the transaction can abort and longjmp behind the _ITM_beginTransaction call). Draft patch proposed in http://gcc.gnu.org/ml/gcc/2011-01/msg00279.html Patrick Marlier.
Mine. We aren't representing these functions as setjmp because we have more detailed information about the control flow, and it's (eventually) explicitly represented. For instance, if a function has two transactions we know that the commit for the second transaction cannot branch to the restart of the first transaction. The problem here is that these extra edges are not inserted until after the tail-call discovery pass is run. Thankfully, no actual transform is actually performed at that spot. All we have to do later is clear the bit that the tail-call discovery pass set.
Author: rth Date: Wed Feb 9 20:23:56 2011 New Revision: 169984 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169984 Log: PR 47530 * trans-mem.c (expand_block_edges): Reset tail-call bit. Added: branches/transactional-memory/gcc/testsuite/g++.dg/tm/pr47530.C Modified: branches/transactional-memory/gcc/ChangeLog.tm branches/transactional-memory/gcc/trans-mem.c
Fixed.
I hope next time to come up with a 'perfect' patch! Thanks for the explanation and for the fix. :) Patrick.
Created attachment 27288 [details] Preprocessed RBTree microbenchmark from RSTM that triggers this bug. I'm still seeing this bug with gcc-4.8 (svn r187050). I've attached a preprocessed source file from the RSTM microbenchmark suite (for fc13, x86_64) that manifests the problem. It simply takes an integer set, implemented as an RBTree, and randomly performs insert/delete/lookup operations as transactions. The "p" command line parameter sets the number of threads to use. The rest of the command line parameters aren't relevant. Unoptimized compilation produces expected output: luke> g++ -fgnu-tm -o TreeBench TreeBench.i -lrt luke> ./TreeBench -p2 Verification: Passed csv, ALG=gcc-tm, B=RBTree, R=34, d=1, p=2, X=0, m=256, S=1, O=1, txns=1231838, time=1000001329, throughput=1231836 Optimized compilation shows stack corruption ("time" output is computed using a stack location that is trashed when _ITM_commitTransaction longjmps from the tail call). luke> g++ -O3 -fgnu-tm -o TreeBench TreeBench.i -lrt luke> ./TreeBench -p2 Verification: Passed csv, ALG=gcc-tm, B=RBTree, R=34, d=1, p=2, X=0, m=256, S=1, O=1, txns=1301244673, time=1335976459721524108, throughput=0 Tail (sibling?) calls to _ITM_commitTransaction are identified by 'jmp' instructions. luke> g++ -O3 -fgnu-tm -o /dev/stdout -S TreeBench.i | grep _ITM_commitTransaction call _ITM_commitTransaction jmp _ITM_commitTransaction jmp _ITM_commitTransaction I expect that this is also related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51771--and in fact may be the same bug, but it's manifesting as the specific tail-call problem that was noted here as fixed. The returns_twice "hack" that suppresses pr51771 in gcc-4.7 also suppresses this bug (pr47530) in gcc-4.7, so it's gcc-4.8-only at this point. I will reopen pr47530, but if someone can confirm it as really being pr51771 I'm happy to have it closed again.
(In reply to comment #5) > Created attachment 27288 [details] > Preprocessed RBTree microbenchmark from RSTM that triggers this bug. > > [snip] > > I will reopen pr47530, but if someone can confirm it as really being pr51771 > I'm happy to have it closed again. Of course, what I mean is that this can be re-opened for 4.8, if appropriate. For now, gcc-4.7 is fine.
Reopened with Luke's testcase, which triggers on both 4.7 and 4.8. Luke, in the future, it would really help if you could narrow down the testcase to the smallest reproducible testcase. This helps us immensely and expedites any possible fix.
Created attachment 27555 [details] reduced testcase for luke's sample
Aldy, I have a testcase and a patch for this. I will submit it soon. Patrick
Created attachment 27556 [details] propsed patch being tested Whoops, sorry Patrick. I already have a patch I am testing :). If you already have yours tested, feel free to submit it, otherwise I'll post shortly.
Created attachment 27557 [details] testcase (In reply to comment #10) > Created attachment 27556 [details] > propsed patch being tested > > Whoops, sorry Patrick. I already have a patch I am testing :). If you already > have yours tested, feel free to submit it, otherwise I'll post shortly. Go ahead and submit (I will not able to do it before few hours). My patch was quite similar except for the testcase (using the tree dump but untested). Testcase attached if you want to include it into the testsuite.
Author: aldyh Date: Mon Jun 4 16:51:24 2012 New Revision: 188190 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188190 Log: PR middle-end/47530 * trans-mem.c (expand_block_edges): Do not skip the first statement when resetting the BB. Added: trunk/gcc/testsuite/g++.dg/tm/pr47530-2.C Modified: trunk/gcc/ChangeLog trunk/gcc/trans-mem.c
Author: aldyh Date: Mon Jun 4 16:52:47 2012 New Revision: 188191 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188191 Log: PR middle-end/47530 * trans-mem.c (expand_block_edges): Do not skip the first statement when resetting the BB. Added: branches/gcc-4_7-branch/gcc/testsuite/g++.dg/tm/pr47530-2.C Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/trans-mem.c
fixed on 4.7 and trunk. closing PR.