This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/50304] New: poor code for accessing certain element of arrays


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304

             Bug #: 50304
           Summary: poor code for accessing certain element of arrays
    Classification: Unclassified
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: tom@deltasystem.hu


Array elements on known (hardcoded) positions are accessed by inefficient code.
The addresses are computed runtime, waisting both runtime and code size. For
example:

int a[4][4][16];

void foo( void )
{
    int c;

    c = a[2][3][4];
    ...
}
compiles this way in -O1, ARM Cortex M0:
   0:    4b02          ldr    r3, [pc, #8]    ; (c <foo+0xc>)
   2:    22b4          movs    r2, #180    ; 0xb4
   4:    0092          lsls    r2, r2, #2
   6:    589a          ldr    r2, [r3, r2]
-----------
I tried to convert this operation to be precompiled by forming a constant
address.

int a[4][4][16];
int *const b=&a[2][3][4];

void foo( void )
{
    int c;

    c = *b;
    ...
}
However, it is ignored by gcc, and compiles this way in -O1:
   0:    4b03          ldr    r3, [pc, #12]    ; (10 <foo+0x10>)
   2:    21b4          movs    r1, #180    ; 0xb4
   4:    0089          lsls    r1, r1, #2
   6:    185a          adds    r2, r3, r1
   8:    6812          ldr    r2, [r2, #0]

The actual code can vary, depending on the situation. For example:
   0:    4b02          ldr    r3, [pc, #8]    ; (c <foo+0xc>)
   2:    4a03          ldr    r2, [pc, #12]    ; (10 <foo+0x10>)
   4:    589a          ldr    r2, [r3, r2]

or
   0:    4b02          ldr    r3, [pc, #8]    ; (c <foo+0xc>)
   2:    4903          ldr    r1, [pc, #12]    ; (10 <foo+0x10>)
   4:    185a          adds    r2, r3, r1    a[0][0][0]=c;
   6:    6812          ldr    r2, [r2, #0]

Sometimes (I don't know yet why and exactly when) it can be even much worse,
introducing bytewide accessing of e.g. an int32_t, dissassembling and
reassembling it. (It's definetely not an alignment problem.) This waists about
40 instructions for a read-modify-write access.
------------------

It should have look like this:
   0:    4b02          ldr    r3, [pc, #8]    ; (c <foo+0xc>)
   2:    681b          ldr    r3, [r3, #0]
   4:    681a          ldr    r2, [r3, #0]
where the constant (at 0x0c) at the first row is a precalculated address.

------------------
With variable address, like this:

int a[4][4][16];
int *b=&a[2][3][4];
...
it work nicely. However, it is really not equivalent solution at flash based
processors.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]