Bug 29415 - [4.2 Regression] bad code reordering around inline asm block
[4.2 Regression] bad code reordering around inline asm block
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: tree-optimization
4.2.0
: P1 blocker
: 4.2.0
Assigned To: Daniel Berlin
: alias, wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-10 15:38 UTC by Christophe Saout
Modified: 2007-11-30 15:48 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-10-13 17:49:17


Attachments
preprocessed pthread_mutex_lock.c causing the miscompiled code (227.32 KB, text/plain)
2006-10-10 15:40 UTC, Christophe Saout
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Saout 2006-10-10 15:38:15 UTC
Using gcc 4.2 snapshot 20061007 on x86, the compiler is miscompiling glibc 2.5
(see http://sourceware.org/bugzilla/show_bug.cgi?id=3328)

What is happening is that a test of a variable after an inline asm block is moved around so that the load of the struct member is moved above the asm block even though it should not do that.

The miscompilation only seems to appear in that particular constellation, all tries to reproduce the problem using a simpler hand-made test case have failed. So I've attached the whole preprocessed source file.

Compiler simply called with "gcc -O2 -o test.o -c pthread_mutex_lock.c".

The problem appears in the function __pthread_mutex_lock. There's a switch statement with some common stuff in the different cases which the compiler seems to optimize. The case in question (the most common one):

switch (__builtin_expect (mutex->__data.__kind, PTHREAD_MUTEX_TIMED_NP)) {
...
    case PTHREAD_MUTEX_TIMED_NP:
    simple:
      /* Normal mutex.  */
      LLL_MUTEX_LOCK (mutex->__data.__lock);
      assert (mutex->__data.__owner == 0);
      break;
...
}

The LLL_MUTEX_LOCK is a macro with an inline assembler statement (cmpxchg and a conditional jump to another section) that is annotated to tell the compiler that it clobbers memory, so gcc should never move loads around the inline asm block. But in this case the load from mutex->__data.__owner (for the assertion) is executed above LLL_MUTEX_LOCK and the register content is tested below LLL_MUTEX_LOCK. The content will have changed, but the old value in the register is wrong and the assertion triggers.

The broken generated code looks like (note that the jump to <_L_mutex_lock_78> is just an out-of-line call to another function that can wait for some time, modify the mutex->__data.__owner = 0x8(%ebx) here, and then jump back):

loading mutex->__data.__owner (into %edx):
  2f:   8b 53 08                mov    0x8(%ebx),%edx
lll_mutex_lock:
  32:   31 c0                   xor    %eax,%eax
  34:   b9 01 00 00 00          mov    $0x1,%ecx
  39:   f0 0f b1 0b             lock cmpxchg %ecx,(%ebx)
  3d:   0f 85 f3 05 00 00       jne    636 <_L_mutex_lock_78>
testing mutex->__data.__owner (%edx from above!):
  43:   85 d2                   test   %edx,%edx
  45:   0f 85 7f 04 00 00       jne    4ca <__pthread_mutex_lock+0x4ca>
                                        -> __assert_fail

In constrast, gcc 4.1.1 generated the following correct piece of code:

acquiring mutex (lll_mutex_lock):
  3d:   31 c0                   xor    %eax,%eax
  3f:   b9 01 00 00 00          mov    $0x1,%ecx
  44:   f0 0f b1 0b             lock cmpxchg %ecx,(%ebx)
  48:   0f 85 ed 05 00 00       jne    63b <_L_mutex_lock_86>
loading and testing owner:
  4e:   8b 43 08                mov    0x8(%ebx),%eax
  51:   85 c0                   test   %eax,%eax
  53:   0f 85 71 05 00 00       jne    5ca <__pthread_mutex_lock+0x5ca>
                                        -> __assert_fail
Comment 1 Christophe Saout 2006-10-10 15:40:02 UTC
Created attachment 12404 [details]
preprocessed pthread_mutex_lock.c causing the miscompiled code

Call with "gcc -O2 -o test.o -c pthread_mutex_lock.c" on i686 and look at the beginning of the generated __pthread_mutex_lock function, around the first cmpxchg call.
Comment 2 Andrew Pinski 2006-10-10 16:46:34 UTC
 __asm __volatile ( "call *_dl_sysinfo\n\t" : "=a" (__status) : "0" (240), "b" (&mutex->__data.__lock), "S" (0), "c" (0), "d" (_val), "i" (__builtin_offsetof (tcbhead_t, sysinfo)) : "memory");

I think what you are doing here is invalid, you cannot change the control flow via an inline-asm.
Comment 3 Jakub Jelinek 2006-10-10 18:55:30 UTC
No, that's perfectly valid, you can't jump out of an asm or jump into it,
but if you enter the asm and exit it, it doesn't matter what branches or calls
were used inside it (of course if the function you call inside it is written
in C you need to add used attribute to it to make sure it is not optimized
out if it is not otherwise referenced).
Comment 4 Andrew Pinski 2006-10-11 01:05:16 UTC
(In reply to comment #3)
> No, that's perfectly valid, you can't jump out of an asm or jump into it,
> but if you enter the asm and exit it, it doesn't matter what branches or calls
> were used inside it (of course if the function you call inside it is written
> in C you need to add used attribute to it to make sure it is not optimized
> out if it is not otherwise referenced).

Not really, it is still questionable but it is unrelated to the problem as far as I can tell.  The problem is related to struct aliasing, here is a short testcase which shows the problem (for PPC):
typedef struct
{
  int t;
} pthread_mutex_t;

int t;

int f(pthread_mutex_t *a)
{
  a->t = 1;
  asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory");
  return a->t + t;
}

int main(void)
{
  pthread_mutex_t a;
  if (f(&a)!=3)
     __builtin_abort ();  
}

We should not get 1+t but a->t + t in the .final_cleanup as the asm can clober  memory.
Comment 5 Andrew Pinski 2006-10-11 02:35:36 UTC
(In reply to comment #4)
>   asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory");
This is what I get for making a runtime testcase in the bug report.
That asm should be:
  asm("stw%X0 %1,%0" ::"r"(a->t) ,"r"(3) : "memory");
 
Comment 6 Andrew Pinski 2006-10-11 02:37:15 UTC
(In reply to comment #5)
> (In reply to comment #4)
> >   asm("stw%X0 %1,%0" ::"r"(a->t) :"r"(3) : "memory");
> This is what I get for making a runtime testcase in the bug report.
> That asm should be:
>   asm("stw%X0 %1,%0" ::"r"(a->t) ,"r"(3) : "memory");
Grrr, lets try that again, this time for the correct PPC asm:
  asm("stw %1,0(%0)" ::"r"(a->t) ,"r"(3) : "memory");
Comment 7 Daniel Berlin 2006-10-11 02:43:29 UTC
Actually, AFAICT this is a variant of the struct aliasing bug zdenek reported.
The real problem is that we eliminate the false aliases and because their is no real addressable variable here, the asm's miss out on the SMT's.
Comment 8 Daniel Berlin 2006-10-11 02:46:10 UTC
I've verified my fix for the other two bugs will fix this (sorry that one is taking so long).
Comment 9 Andrew Pinski 2006-10-13 17:44:56 UTC
Note Mark has requested the priorities for new regressions stay at P3 so he can see the new regressions and prioritize them himself.  Reverting back to P3 for that reason.
Comment 10 Daniel Berlin 2006-10-13 17:49:17 UTC
mine
Comment 11 Daniel Berlin 2006-10-19 23:06:06 UTC
Subject: Bug 29415

Author: dberlin
Date: Thu Oct 19 23:05:53 2006
New Revision: 117891

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=117891
Log:
2006-10-19  Daniel Berlin  <dberlin@dberlin.org>

	Fix PR tree-optimization/28778
	Fix PR tree-optimization/29156
	Fix PR tree-optimization/29415
	* tree.h (DECL_PTA_ARTIFICIAL): New macro.
	(tree_decl_with_vis): Add artificial_pta_var flag.
	* tree-ssa-alias.c (is_escape_site): Remove alias info argument,
	pushed into callers.
	* tree-ssa-structalias.c (nonlocal_for_type): New variable.
	(nonlocal_all): Ditto.
	(struct variable_info): Add directly_dereferenced member.
	(var_escaped_vars): New variable.
	(escaped_vars_tree): Ditto.
	(escaped_vars_id): Ditto.
	(nonlocal_vars_id): Ditto.
	(new_var_info): Set directly_dereferenced.
	(graph_size): New variable
	(build_constraint_graph): Use graph_size.
	(solve_graph): Don't process constraints that cannot change the
	solution, don't try to propagate an empty solution to our
	successors.
	(process_constraint): Set directly_dereferenced.
	(could_have_pointers): New function.
	(get_constraint_for_component_ref): Don't process STRING_CST.
	(nonlocal_lookup): New function.
	(nonlocal_insert): Ditto.
	(create_nonlocal_var): Ditto.
	(get_nonlocal_id_for_type): Ditto.
	(get_constraint_for): Allow results vector to be empty in the case
	of string constants.
	Handle results of calls properly.
	(update_alias_info): Update alias info stats on number and type of
	calls.
	(find_func_aliases): Use could_have_pointers.
	(make_constraint_from_escaped): Renamed from
	make_constraint_to_anything, and changed to make constraints from
	escape variable.
	(make_constraint_to_escaped): New function.
	(find_global_initializers): Ditto.
	(create_variable_info_for): Make constraint from escaped to any
	global variable, and from any global variable to the set of
	escaped vars.
	(intra_create_variable_infos): Deal with escaped instead of
	pointing to anything.
	(set_uids_in_ptset): Do type pruning on directly dereferenced
	variables.
	(find_what_p_points_to): Adjust call to set_uids_with_ptset.
	(init_base_vars): Fix comment, and initialize escaped_vars.
	(need_to_solve): Removed.
	(find_escape_constraints): New function.
	(expand_nonlocal_solutions): Ditto.
	(compute_points_to_sets): Call find_escape_constraints and
	expand_nonlocal_solutions.
	(delete_points_to_sets): Don't fall off the end of the graph.
	(init_alias_heapvars): Initialize nonlocal_for_type and
	nonlocal_all.
	(delete_alias_heapvars): Free nonlocal_for_type and null out
	nonlocal_all. 


Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr28778.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr29156.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pta-fp.c
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-operands.c
    trunk/gcc/tree-ssa-structalias.c
    trunk/gcc/tree-ssa-structalias.h
    trunk/gcc/tree.h

Comment 12 Daniel Berlin 2006-10-19 23:07:05 UTC
Fixed.
Comment 13 Christopher Friedt 2007-11-30 15:25:25 UTC
Is there a release of gcc where is this problem fixed, or is it only in the repository?

I've been running into this problem for the last 2 days and was working around the clock, assuming that something was wrong with my code.

(In reply to comment #12)
> Fixed.
> 

Comment 14 Richard Biener 2007-11-30 15:27:10 UTC
As you can see from the target milestone it should be fixed in all released
versions (it was only broken during 4.2.0 development).
Comment 15 Christopher Friedt 2007-11-30 15:48:13 UTC
Right - I guess I should make the Gentoo folks aware of this, because in their current tree anything >= 4.2.0 is marked as unstable. 

Thanks for the reply.

(In reply to comment #14)
> As you can see from the target milestone it should be fixed in all released
> versions (it was only broken during 4.2.0 development).
>