This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/40772] New: generating rendundant moves from second byte of 32b/64b register
- From: "zsojka at seznam dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 16 Jul 2009 15:32:47 -0000
- Subject: [Bug rtl-optimization/40772] New: generating rendundant moves from second byte of 32b/64b register
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
For the following code:
------------------------------------------------
uint8_t data[16];
static __attribute__((noinline)) void test(unsigned i)
{
unsigned j;
for (j = 0; j < 16; j++)
data[j] = ((i + j) & 0xFF00) >> 8;
}
------------------------------------------------
generated asm looks like (using -fno-tree-vectorize because of pr40771 )
# ./gcc tst2b.c -o tst2.o -O3 -march=k8 -fno-tree-vectorize
------------------------------------------------
test:
.LFB11:
.cfi_startproc
movq %rdi, %rdx
movzbl %dh, %eax
movb %al, data(%rip)
leal 1(%rdi), %eax
movzbl %ah, %eax
movb %al, data+1(%rip)
leal 2(%rdi), %eax
movzbl %ah, %eax
movb %al, data+2(%rip)
leal 3(%rdi), %eax
movzbl %ah, %eax
movb %al, data+3(%rip)
.....
------------------------------------------------
When " movzbl %ah, %eax ; movb %al, data+1(%rip) " is replaced by " movb %ah,
data+1(%rip) ", code is faster. (other issue may be using lea even for
-march=pentium4 which would probably prefer add eax,1, but I can't verify that)
# ./gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --enable-languages=c,c++
--prefix=/mnt/svn/gcc-trunk/build/
Thread model: posix
gcc version 4.5.0 20090714 (experimental) (GCC)
CPU is AMD Phenom (4 cores, Barcelona) running at fixed 1400MHz.
gcc's generated code runs in 19 ticks in average, code with "movzbl ; mov al"
replaced by "mov ah" runs in 16 ticks.
Attached is whole test code.
--
Summary: generating rendundant moves from second byte of 32b/64b
register
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: zsojka at seznam dot cz
GCC host triplet: x86_64-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40772