Bug 40314 - inefficient address calculation of fields of large struct
Summary: inefficient address calculation of fields of large struct
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Blocks: 16996
  Show dependency treegraph
Reported: 2009-05-31 02:38 UTC by Carrot
Modified: 2010-03-20 19:27 UTC (History)
3 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail: 4.5.0
Last reconfirmed: 2009-05-31 07:22:23

test case to show the opportunity (189 bytes, text/plain)
2009-05-31 02:42 UTC, Carrot

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2009-05-31 02:38:13 UTC
Given a structure and 3 field access

typedef struct network
   char inputfile[400];
   int* nodes;
   int* arcs;
   int* stop_arcs;
} network_t;

   int *arc;
   int *node = net->nodes;                                           <--- A
   void *stop = (void *)net->stop_arcs;                              <--- B
   for( arc = net->arcs; arc != (int *)stop; arc++ )                 <--- C

GCC generates following instruction sequence in thumb mode with options -O2 -Os, it needs 9 insts to load 3 fields 

       mov     r2, #200         <---  A1
       lsl     r1, r2, #1       <---  A2
       .loc 1 14 0
       mov     r4, #204         <----  B1
       lsl     r3, r4, #1       <---   B2
       .loc 1 13 0
       ldr     r2, [r0, r1]      <----  A3
       .loc 1 15 0
       mov     r1, #202          <---  C1
       .loc 1 14 0
       ldr     r4, [r0, r3]      <--- B3
       .loc 1 15 0
       lsl     r3, r1, #1        <---  C2
       ldr     r3, [r0, r3]       <---  C3

A better method is adjusting the base address first, which is nearer to all 3 fields we will access. Then we can use ldr dest, [base, offset] to load each fields with only 1 instruction.

Although this opportunity is found in target ARM, it should also be applicable to other architectures with addressing mode of (base + offset) and offset has a limited value range.
Comment 1 Carrot 2009-05-31 02:42:09 UTC
Created attachment 17940 [details]
test case to show the opportunity
Comment 2 Carrot 2009-05-31 02:51:37 UTC
There are a lot of such opportunities in mcf from SPEC CPU 2006.

One possible implementation is to add a pass before cse. In the new pass it should detect insn patterns like:

(set r200 400)                     # 400 is offset of field1
(set r201 (mem (plus r100 r200)))  # r100 contains struct base
(set r300 404)                     # 404 is offset of field2
(set r301 (mem (plus r100 r300)))  # r100 contains struct base

And rewrite them as:

(set r200 400)                     # keep the original insn
(set r250 (plus r100 400))         # r250 is new base
(set r201 (mem (plus r250 0)))
(set r300 404)
(set r251 (plus r100 400))         # r251 contains same value as r250
(set r301 (mem (plus r251 4)))

We can let dce and cse remove the redundant code, the final result should look like:

(set r101 (plus r100 400))
(set r201 (mem (plus r101 0)))
(set r301 (mem (plus r101 4)))
Comment 3 Eric Botcazou 2009-05-31 07:22:22 UTC
I think we have enough passes already and should try to stuff this in cse.c and
fwprop.c.  See PR middle-end/33699 for related issues.
Comment 4 Carrot 2009-05-31 08:05:34 UTC
(In reply to comment #3)
> I think we have enough passes already and should try to stuff this in cse.c and
> fwprop.c.  See PR middle-end/33699 for related issues.

It looks that patch solved some similar issues. But there are still several differences:

1. PR/33699 can only handle constant addresses, while in my case the addresses are not constants. And I believe non-constant cases (memory accesses through pointer) occurs more frequently than constant addresses(embedded system only?).

2. That patch can only be applicable to known base address. While in my case, the known base address of memory accesses are the pointer to struct, there is no known nearby base address, so we need to create a new nearby base address.

3. That patch works on superblock, but it looks better to optimize the memory accesses on the whole function body, it is quite common to access memory through same pointer in different basic blocks, as shown in mcf.