Using gcc 4.2 snapshot 20061007 on x86, the compiler is miscompiling glibc 2.5 (see http://sourceware.org/bugzilla/show_bug.cgi?id=3328) What is happening is that a test of a variable after an inline asm block is moved around so that the load of the struct member is moved above the asm block even though it should not do that. The miscompilation only seems to appear in that particular constellation, all tries to reproduce the problem using a simpler hand-made test case have failed. So I've attached the whole preprocessed source file. Compiler simply called with "gcc -O2 -o test.o -c pthread_mutex_lock.c". The problem appears in the function __pthread_mutex_lock. There's a switch statement with some common stuff in the different cases which the compiler seems to optimize. The case in question (the most common one): switch (__builtin_expect (mutex->__data.__kind, PTHREAD_MUTEX_TIMED_NP)) { ... case PTHREAD_MUTEX_TIMED_NP: simple: /* Normal mutex. */ LLL_MUTEX_LOCK (mutex->__data.__lock); assert (mutex->__data.__owner == 0); break; ... } The LLL_MUTEX_LOCK is a macro with an inline assembler statement (cmpxchg and a conditional jump to another section) that is annotated to tell the compiler that it clobbers memory, so gcc should never move loads around the inline asm block. But in this case the load from mutex->__data.__owner (for the assertion) is executed above LLL_MUTEX_LOCK and the register content is tested below LLL_MUTEX_LOCK. The content will have changed, but the old value in the register is wrong and the assertion triggers. The broken generated code looks like (note that the jump to <_L_mutex_lock_78> is just an out-of-line call to another function that can wait for some time, modify the mutex->__data.__owner = 0x8(%ebx) here, and then jump back): loading mutex->__data.__owner (into %edx): 2f: 8b 53 08 mov 0x8(%ebx),%edx lll_mutex_lock: 32: 31 c0 xor %eax,%eax 34: b9 01 00 00 00 mov $0x1,%ecx 39: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) 3d: 0f 85 f3 05 00 00 jne 636 <_L_mutex_lock_78> testing mutex->__data.__owner (%edx from above!): 43: 85 d2 test %edx,%edx 45: 0f 85 7f 04 00 00 jne 4ca <__pthread_mutex_lock+0x4ca> -> __assert_fail In constrast, gcc 4.1.1 generated the following correct piece of code: acquiring mutex (lll_mutex_lock): 3d: 31 c0 xor %eax,%eax 3f: b9 01 00 00 00 mov $0x1,%ecx 44: f0 0f b1 0b lock cmpxchg %ecx,(%ebx) 48: 0f 85 ed 05 00 00 jne 63b <_L_mutex_lock_86> loading and testing owner: 4e: 8b 43 08 mov 0x8(%ebx),%eax 51: 85 c0 test %eax,%eax 53: 0f 85 71 05 00 00 jne 5ca <__pthread_mutex_lock+0x5ca> -> __assert_fail
Created attachment 12404 [details] preprocessed pthread_mutex_lock.c causing the miscompiled code Call with "gcc -O2 -o test.o -c pthread_mutex_lock.c" on i686 and look at the beginning of the generated __pthread_mutex_lock function, around the first cmpxchg call.
__asm __volatile ( "call *_dl_sysinfo\n\t" : "=a" (__status) : "0" (240), "b" (&mutex->__data.__lock), "S" (0), "c" (0), "d" (_val), "i" (__builtin_offsetof (tcbhead_t, sysinfo)) : "memory"); I think what you are doing here is invalid, you cannot change the control flow via an inline-asm.
No, that's perfectly valid, you can't jump out of an asm or jump into it, but if you enter the asm and exit it, it doesn't matter what branches or calls were used inside it (of course if the function you call inside it is written in C you need to add used attribute to it to make sure it is not optimized out if it is not otherwise referenced).
(In reply to comment #3) > No, that's perfectly valid, you can't jump out of an asm or jump into it, > but if you enter the asm and exit it, it doesn't matter what branches or calls > were used inside it (of course if the function you call inside it is written > in C you need to add used attribute to it to make sure it is not optimized > out if it is not otherwise referenced). Not really, it is still questionable but it is unrelated to the problem as far as I can tell. The problem is related to struct aliasing, here is a short testcase which shows the problem (for PPC): typedef struct { int t; } pthread_mutex_t; int t; int f(pthread_mutex_t *a) { a->t = 1; asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory"); return a->t + t; } int main(void) { pthread_mutex_t a; if (f(&a)!=3) __builtin_abort (); } We should not get 1+t but a->t + t in the .final_cleanup as the asm can clober memory.
(In reply to comment #4) > asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory"); This is what I get for making a runtime testcase in the bug report. That asm should be: asm("stw%X0 %1,%0" ::"r"(a->t) ,"r"(3) : "memory");
(In reply to comment #5) > (In reply to comment #4) > > asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory"); > This is what I get for making a runtime testcase in the bug report. > That asm should be: > asm("stw%X0 %1,%0" ::"r"(a->t) ,"r"(3) : "memory"); Grrr, lets try that again, this time for the correct PPC asm: asm("stw %1,0(%0)" ::"r"(a->t) ,"r"(3) : "memory");
Actually, AFAICT this is a variant of the struct aliasing bug zdenek reported. The real problem is that we eliminate the false aliases and because their is no real addressable variable here, the asm's miss out on the SMT's.
I've verified my fix for the other two bugs will fix this (sorry that one is taking so long).
Note Mark has requested the priorities for new regressions stay at P3 so he can see the new regressions and prioritize them himself. Reverting back to P3 for that reason.
mine
Subject: Bug 29415 Author: dberlin Date: Thu Oct 19 23:05:53 2006 New Revision: 117891 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=117891 Log: 2006-10-19 Daniel Berlin <dberlin@dberlin.org> Fix PR tree-optimization/28778 Fix PR tree-optimization/29156 Fix PR tree-optimization/29415 * tree.h (DECL_PTA_ARTIFICIAL): New macro. (tree_decl_with_vis): Add artificial_pta_var flag. * tree-ssa-alias.c (is_escape_site): Remove alias info argument, pushed into callers. * tree-ssa-structalias.c (nonlocal_for_type): New variable. (nonlocal_all): Ditto. (struct variable_info): Add directly_dereferenced member. (var_escaped_vars): New variable. (escaped_vars_tree): Ditto. (escaped_vars_id): Ditto. (nonlocal_vars_id): Ditto. (new_var_info): Set directly_dereferenced. (graph_size): New variable (build_constraint_graph): Use graph_size. (solve_graph): Don't process constraints that cannot change the solution, don't try to propagate an empty solution to our successors. (process_constraint): Set directly_dereferenced. (could_have_pointers): New function. (get_constraint_for_component_ref): Don't process STRING_CST. (nonlocal_lookup): New function. (nonlocal_insert): Ditto. (create_nonlocal_var): Ditto. (get_nonlocal_id_for_type): Ditto. (get_constraint_for): Allow results vector to be empty in the case of string constants. Handle results of calls properly. (update_alias_info): Update alias info stats on number and type of calls. (find_func_aliases): Use could_have_pointers. (make_constraint_from_escaped): Renamed from make_constraint_to_anything, and changed to make constraints from escape variable. (make_constraint_to_escaped): New function. (find_global_initializers): Ditto. (create_variable_info_for): Make constraint from escaped to any global variable, and from any global variable to the set of escaped vars. (intra_create_variable_infos): Deal with escaped instead of pointing to anything. (set_uids_in_ptset): Do type pruning on directly dereferenced variables. (find_what_p_points_to): Adjust call to set_uids_with_ptset. (init_base_vars): Fix comment, and initialize escaped_vars. (need_to_solve): Removed. (find_escape_constraints): New function. (expand_nonlocal_solutions): Ditto. (compute_points_to_sets): Call find_escape_constraints and expand_nonlocal_solutions. (delete_points_to_sets): Don't fall off the end of the graph. (init_alias_heapvars): Initialize nonlocal_for_type and nonlocal_all. (delete_alias_heapvars): Free nonlocal_for_type and null out nonlocal_all. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr28778.c trunk/gcc/testsuite/gcc.c-torture/execute/pr29156.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pta-fp.c trunk/gcc/tree-ssa-alias.c trunk/gcc/tree-ssa-operands.c trunk/gcc/tree-ssa-structalias.c trunk/gcc/tree-ssa-structalias.h trunk/gcc/tree.h
Fixed.
Is there a release of gcc where is this problem fixed, or is it only in the repository? I've been running into this problem for the last 2 days and was working around the clock, assuming that something was wrong with my code. (In reply to comment #12) > Fixed. >
As you can see from the target milestone it should be fixed in all released versions (it was only broken during 4.2.0 development).
Right - I guess I should make the Gentoo folks aware of this, because in their current tree anything >= 4.2.0 is marked as unstable. Thanks for the reply. (In reply to comment #14) > As you can see from the target milestone it should be fixed in all released > versions (it was only broken during 4.2.0 development). >