Bug 39116 - wrong register used when generating assembly with -O1
Summary: wrong register used when generating assembly with -O1
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: inline-asm (show other bugs)
Version: 4.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-06 01:47 UTC by Fabrice Ferino
Modified: 2009-02-06 17:16 UTC (History)
1 user (show)

See Also:
Host: x86_64-suse-linux
Target: x86_64-suse-linux
Build: x86_64-suse-linux
Known to work:
Known to fail:
Last reconfirmed:


Attachments
preprocessed file (293 bytes, text/plain)
2009-02-06 01:50 UTC, Fabrice Ferino
Details
generated assembly (478 bytes, application/octet-stream)
2009-02-06 01:51 UTC, Fabrice Ferino
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabrice Ferino 2009-02-06 01:47:21 UTC
The inline assembly can end up generating instructions of the form 
mulq %rax when the two operands should be distinct (i.e. one should not compute %rax * %rax, a square but a product of two different numbers).(The x86 mulq instruction will compute the product of the operand with the %rax register).
This happens with -O1 and any higher level of optimizations.

Version information:

gcc -m64 -c -O1 -v -save-temps gnu_bug_test.c
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.2.1 --enable-ssp --disable-libssp --disable-libgcj --with-slibdir=/lib64 --with-system-zlib --enable-shared --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --program-suffix=-4.2 --enable-version-specific-runtime-libs --without-system-libunwind --with-cpu=generic --host=x86_64-suse-linux
Thread model: posix
gcc version 4.2.1 (SUSE Linux)
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/cc1 -E -quiet -v gnu_bug_test.c -m64 -mtune=generic -O1 -fpch-preprocess -o gnu_bug_test.i
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/../../../../x86_64-suse-linux/include
 /usr/include
End of search list.
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/cc1 -fpreprocessed gnu_bug_test.i -quiet -dumpbase gnu_bug_test.c -m64 -mtune=generic -auxbase gnu_bug_test -O1 -version -o gnu_bug_test.s
GNU C version 4.2.1 (SUSE Linux) (x86_64-suse-linux)
        compiled by GNU C version 4.2.1 (SUSE Linux).
GGC heuristics: --param ggc-min-expand=94 --param ggc-min-heapsize=120388
Compiler executable checksum: 15875aa798ad559dcc04a8709924cdb9
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/../../../../x86_64-suse-linux/bin/as -V -Qy --64 -o gnu_bug_test.o gnu_bug_test.s
GNU assembler version 2.17.50 (x86_64-suse-linux) using BFD version (GNU Binutils) 2.17.50.20070726-14 (SUSE Linux)
Comment 1 Fabrice Ferino 2009-02-06 01:50:20 UTC
Created attachment 17257 [details]
preprocessed file

Note that all multiplicands are distinct.
Comment 2 Fabrice Ferino 2009-02-06 01:51:03 UTC
Created attachment 17258 [details]
generated assembly

note the mulq %rax instruction towards the end.
Comment 3 Andrew Pinski 2009-02-06 01:58:46 UTC
b[1] is uninitialized.
Comment 4 Fabrice Ferino 2009-02-06 03:48:52 UTC
(In reply to comment #3)
> b[1] is uninitialized.

Yes, initializing b[1] makes the problem go away in the snippet. I see this problem in a much larger program but I can't reproduce it in a simple context. Feel free to close the bug as invalid since there's a workaround for it, although it impacts performance.
Comment 5 Fabrice Ferino 2009-02-06 04:24:30 UTC
(In reply to comment #4)
> Yes, initializing b[1] makes the problem go away in the snippet. I see this
> problem in a much larger program but I can't reproduce it in a simple context.
> Feel free to close the bug as invalid since there's a workaround for it,
> although it impacts performance.
More details on this: the problem happens only with -O3 in the large context. The workaround is to add a movq %3, %%eax before doing the mulq %4. That's why an invalid mul %eax was thought to be the problem. I thought the snippet acurately reproduced the problem but it seems unlikely at this point.  
Comment 6 Fabrice Ferino 2009-02-06 17:16:30 UTC
(In reply to comment #5)
Closing this as invalid. The problem is the interaction of the inline assembly with the optimization option -funswitch-loops. Will submit a new bug if I can find a simple way to reproduce it.