This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/50304] New: poor code for accessing certain element of arrays
- From: "tom at deltasystem dot hu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 06 Sep 2011 14:59:02 +0000
- Subject: [Bug tree-optimization/50304] New: poor code for accessing certain element of arrays
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304
Bug #: 50304
Summary: poor code for accessing certain element of arrays
Classification: Unclassified
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: tom@deltasystem.hu
Array elements on known (hardcoded) positions are accessed by inefficient code.
The addresses are computed runtime, waisting both runtime and code size. For
example:
int a[4][4][16];
void foo( void )
{
int c;
c = a[2][3][4];
...
}
compiles this way in -O1, ARM Cortex M0:
0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>)
2: 22b4 movs r2, #180 ; 0xb4
4: 0092 lsls r2, r2, #2
6: 589a ldr r2, [r3, r2]
-----------
I tried to convert this operation to be precompiled by forming a constant
address.
int a[4][4][16];
int *const b=&a[2][3][4];
void foo( void )
{
int c;
c = *b;
...
}
However, it is ignored by gcc, and compiles this way in -O1:
0: 4b03 ldr r3, [pc, #12] ; (10 <foo+0x10>)
2: 21b4 movs r1, #180 ; 0xb4
4: 0089 lsls r1, r1, #2
6: 185a adds r2, r3, r1
8: 6812 ldr r2, [r2, #0]
The actual code can vary, depending on the situation. For example:
0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>)
2: 4a03 ldr r2, [pc, #12] ; (10 <foo+0x10>)
4: 589a ldr r2, [r3, r2]
or
0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>)
2: 4903 ldr r1, [pc, #12] ; (10 <foo+0x10>)
4: 185a adds r2, r3, r1 a[0][0][0]=c;
6: 6812 ldr r2, [r2, #0]
Sometimes (I don't know yet why and exactly when) it can be even much worse,
introducing bytewide accessing of e.g. an int32_t, dissassembling and
reassembling it. (It's definetely not an alignment problem.) This waists about
40 instructions for a read-modify-write access.
------------------
It should have look like this:
0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>)
2: 681b ldr r3, [r3, #0]
4: 681a ldr r2, [r3, #0]
where the constant (at 0x0c) at the first row is a precalculated address.
------------------
With variable address, like this:
int a[4][4][16];
int *b=&a[2][3][4];
...
it work nicely. However, it is really not equivalent solution at flash based
processors.