This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug regression/44281] New: Global Register variable pessimisation and regression
- From: "adam at consulting dot net dot nz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 26 May 2010 05:12:27 -0000
- Subject: [Bug regression/44281] New: Global Register variable pessimisation and regression
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
I am aware developers WONTFIX GCC being a pessimising compiler with respect to
some global register variable issues:
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596>
GCC is copying registers for no good reason whatsoever. Below is a very simple
example where gcc 3.3.6 does a better job of optimising the code. Unnecessary
copying of registers may also occur with local register variables.
#include <stdint.h>
register uint64_t global_flag_stack __asm__("rbx");
void push_flag_into_global_reg_var(uint64_t a, uint64_t b) {
uint64_t flag = (a==b);
global_flag_stack <<= 8;
global_flag_stack |= flag;
}
uint64_t push_flag_into_local_var(uint64_t a, uint64_t b,
uint64_t local_flag_stack) {
uint64_t flag = (a==b);
local_flag_stack <<= 8;
return local_flag_stack | flag;
}
int main() {
}
gcc-3.3 (GCC) 3.3.6 (Debian 1:3.3.6-15):
$ gcc-3.3 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less
...
0000000000400478 <push_flag_into_global_reg_var>:
400478: 31 c0 xor eax,eax
40047a: 48 39 f7 cmp rdi,rsi
40047d: 0f 94 c0 sete al
400480: 48 c1 e3 08 shl rbx,0x8
400484: 48 09 c3 or rbx,rax
400487: c3 ret
0000000000400488 <push_flag_into_local_var>:
400488: 31 c0 xor eax,eax
40048a: 48 39 f7 cmp rdi,rsi
40048d: 0f 94 c0 sete al
400490: 48 c1 e2 08 shl rdx,0x8
400494: 48 09 d0 or rax,rdx
400497: c3 ret
...
gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-29):
$ gcc-4.1 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less
...
0000000000400448 <push_flag_into_global_reg_var>:
400448: 48 89 da mov rdx,rbx
40044b: 31 c0 xor eax,eax
40044d: 48 c1 e2 08 shl rdx,0x8
400451: 48 39 f7 cmp rdi,rsi
400454: 0f 94 c0 sete al
400457: 48 89 d3 mov rbx,rdx
40045a: 48 09 c3 or rbx,rax
40045d: c3 ret
000000000040045e <push_flag_into_local_var>:
40045e: 48 c1 e2 08 shl rdx,0x8
400462: 31 c0 xor eax,eax
400464: 48 39 f7 cmp rdi,rsi
400467: 0f 94 c0 sete al
40046a: 48 09 d0 or rax,rdx
40046d: c3 ret
...
gcc-4.5 (Debian 4.5.0-1) 4.5.0:
$ gcc-4.5 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less
...
0000000000400494 <push_flag_into_global_reg_var>:
400494: 31 d2 xor edx,edx
400496: 48 39 f7 cmp rdi,rsi
400499: 48 89 d8 mov rax,rbx
40049c: 0f 94 c2 sete dl
40049f: 48 c1 e0 08 shl rax,0x8
4004a3: 48 89 d3 mov rbx,rdx
4004a6: 48 09 c3 or rbx,rax
4004a9: c3 ret
00000000004004aa <push_flag_into_local_var>:
4004aa: 48 89 d0 mov rax,rdx
4004ad: 31 d2 xor edx,edx
4004af: 48 c1 e0 08 shl rax,0x8
4004b3: 48 39 f7 cmp rdi,rsi
4004b6: 0f 94 c2 sete dl
4004b9: 48 09 d0 or rax,rdx
4004bc: c3 ret
...
The object code that current GCC is generating is embarrassing compared with
GCC 3.3.6. Is it also necessary to increase the code footprint of
push_flag_into_local_var when optimising for size (-Os) when compared to gcc
3.3.6 and 4.1.3?
--
Summary: Global Register variable pessimisation and regression
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: adam at consulting dot net dot nz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281