This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/27016] ARM optimizer produces severely suboptimal code
- From: "steven at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 30 Jun 2009 11:35:26 -0000
- Subject: [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
- References: <bug-27016-12470@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #3 from steven at gcc dot gnu dot org 2009-06-30 11:35 -------
For this test case:
unsigned int code_in_ram[100];
void testme(void)
{
unsigned int *p_rom, *p_ram, *p_end, len;
extern unsigned int _ram_erase_sector_start;
extern unsigned int _ram_erase_sector_end;
p_ram = code_in_ram;
p_rom = &_ram_erase_sector_start;
len = ((unsigned int)&_ram_erase_sector_end
- (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int);
for (len = 100; len > 0; --len)
{
*p_ram++ = *p_rom++;
}
}
I get the following code with a 4.5.0 checkout from a few days ago (see
.ident):
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file "t.c"
.text
.align 2
.global testme
.type testme, %function
testme:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r1, .L5
ldr r2, .L5+4
mov r3, #0
.L2:
ldr r0, [r2, r3]
str r0, [r1, r3]
add r3, r3, #4
cmp r3, #400
bne .L2
bx lr
.L6:
.align 2
.L5:
.word code_in_ram
.word _ram_erase_sector_start
.size testme, .-testme
.comm code_in_ram,400,4
.ident "GCC: (GNU) 4.5.0 20090626 (experimental) [trunk revision
148960]"
This looks marginally better than the code of comment #2.
With the original test case of comment #0 I get the following code:
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file "t.c"
.text
.align 2
.global testme
.type testme, %function
testme:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r2, .L5
ldr r3, .L5+4
ldr r1, .L5+8
b .L2
.L3:
ldr r0, [r3], #4
str r0, [r2, #-4]
.L2:
cmp r3, r1
add r2, r2, #4
bcc .L3
bx lr
.L6:
.align 2
.L5:
.word code_in_ram
.word _ram_erase_sector_start
.word _ram_erase_sector_end
.size testme, .-testme
.comm code_in_ram,400,4
.ident "GCC: (GNU) 4.5.0 20090626 (experimental) [trunk revision
148960]"
The "ldr r3, .L6+8" line of comment #0 is now hoisted from the loop into r1
in "ldr r1, .L5+8". GCC still does not use a post-increment for the store
in the loop (see use of r2), even though it uses a post-increment for the load.
So, bug still is there. Might be a dup of one of those many other bugs about
poor use of auto-increment instructions by GCC... ;-)
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-06-30 11:35:26
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016