[alpha] Wrong code produced at -Os, -O2, and -O3

Hi Uros and Richard,
I was rewriting the Alpha sched_find_first_bit implementation for the
Linux Kernel, and in the process I think I've come across a gcc bug.

I rewrote the function using cmov instructions, and wrote a small
program to test its correctness and performance. I wrote the function
initially as an external .S file, and once I was reasonably sure it
was correct, converted it to C function with inline assembly.
Compiling both produce the exact same output, as shown.

        ldq     t0,0(a0)
        clr     t2
        ldq     t1,8(a0)
        cmoveq  t0,0x40,t2
        cmoveq  t0,t1,t0
        cttz    t0,t3
        addq    t3,t2,v0

In my test program, I found that when I executed the rewritten
implementation _before_ the reference implementation that it produced
bogus results. This only happens when using the C/inline asm function.
When compiled with the external .S file, the results are correct.

Attached is a tar.gz with my test code. Compile the test program with
`gcc -O -mcpu=... find.c rewritten.S test.c -o test` with optional
-D__REWRITTEN_INLINE and -D__REWRITTEN_FIRST the program will produce
incorrect results and assert(). At -O0 or -O1 or without one or both
of the -D flags, it will produce correct results. I've tested with
gcc-4.3.4 and gcc-4.4.2.

Thanks. Let me know what I can do to help further.

Matt Turner

