Given a structure and 3 field access
typedef struct network
int *node = net->nodes; <--- A
void *stop = (void *)net->stop_arcs; <--- B
for( arc = net->arcs; arc != (int *)stop; arc++ ) <--- C
GCC generates following instruction sequence in thumb mode with options -O2 -Os, it needs 9 insts to load 3 fields
mov r2, #200 <--- A1
lsl r1, r2, #1 <--- A2
.loc 1 14 0
mov r4, #204 <---- B1
lsl r3, r4, #1 <--- B2
.loc 1 13 0
ldr r2, [r0, r1] <---- A3
.loc 1 15 0
mov r1, #202 <--- C1
.loc 1 14 0
ldr r4, [r0, r3] <--- B3
.loc 1 15 0
lsl r3, r1, #1 <--- C2
ldr r3, [r0, r3] <--- C3
A better method is adjusting the base address first, which is nearer to all 3 fields we will access. Then we can use ldr dest, [base, offset] to load each fields with only 1 instruction.
Although this opportunity is found in target ARM, it should also be applicable to other architectures with addressing mode of (base + offset) and offset has a limited value range.
Created attachment 17940 [details]
test case to show the opportunity
There are a lot of such opportunities in mcf from SPEC CPU 2006.
One possible implementation is to add a pass before cse. In the new pass it should detect insn patterns like:
(set r200 400) # 400 is offset of field1
(set r201 (mem (plus r100 r200))) # r100 contains struct base
(set r300 404) # 404 is offset of field2
(set r301 (mem (plus r100 r300))) # r100 contains struct base
And rewrite them as:
(set r200 400) # keep the original insn
(set r250 (plus r100 400)) # r250 is new base
(set r201 (mem (plus r250 0)))
(set r300 404)
(set r251 (plus r100 400)) # r251 contains same value as r250
(set r301 (mem (plus r251 4)))
We can let dce and cse remove the redundant code, the final result should look like:
(set r101 (plus r100 400))
(set r201 (mem (plus r101 0)))
(set r301 (mem (plus r101 4)))
I think we have enough passes already and should try to stuff this in cse.c and
fwprop.c. See PR middle-end/33699 for related issues.
(In reply to comment #3)
> I think we have enough passes already and should try to stuff this in cse.c and
> fwprop.c. See PR middle-end/33699 for related issues.
It looks that patch solved some similar issues. But there are still several differences:
1. PR/33699 can only handle constant addresses, while in my case the addresses are not constants. And I believe non-constant cases (memory accesses through pointer) occurs more frequently than constant addresses(embedded system only?).
2. That patch can only be applicable to known base address. While in my case, the known base address of memory accesses are the pointer to struct, there is no known nearby base address, so we need to create a new nearby base address.
3. That patch works on superblock, but it looks better to optimize the memory accesses on the whole function body, it is quite common to access memory through same pointer in different basic blocks, as shown in mcf.