orl vs. orb
Paul Buchheit
paul@google.com
Wed Jan 12 23:02:00 GMT 2000
egcs does not appear to be choosing the best instruction for
bitwise ORs with small constants.
Here is a program that demostrates this problem (by taking
nearly four times as long to run!):
#include<stdio.h>
int main() {
const int kMem = 1000000;
int * mem = new int[kMem];
int v = 0;
for (int j = 0; j < 500; j++) {
for (int i = 0; i < kMem; i++) {
#ifdef GO_SLOW
v += mem[i] | 6;
#else
v += mem[i] | 0xaabbccdd;
#endif
}
}
return v;
}
beavis:~/code% gcc --version
egcs-2.91.66
beavis:~/code% gcc -Wall -mpentiumpro -O4 oper.cc
beavis:~/code% time ./a.out
3.47user 0.01system 0:03.47elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
beavis:~/code% gcc -Wall -mpentiumpro -O4 oper.cc -DGO_SLOW
beavis:~/code% time ./a.out
12.72user 0.01system 0:12.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
The only difference in the assembly is 'orl' vs. 'orb'.
--- GO_SLOW asm ---
movl (%edx),%eax
orb $6,%al
addl %eax,%ecx
--- fast asm ---
movl (%edx),%eax
orl $-1430532899,%eax
addl %eax,%ecx
---
My system:
beavis:~/code% uname -a
Linux beavis.corp.google.com 2.2.11 #4 Mon Dec 6 18:56:10 PST 1999 i686 unknown
beavis:~/code% cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : Celeron (Mendocino)
stepping : 0
cpu MHz : 434.330085
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx osfxsr
bogomips : 432.54
More information about the Gcc-bugs
mailing list