This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: New rematerialization sub-pass in LRA
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Vladimir Makarov'" <vmakarov at redhat dot com>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 14 Oct 2014 17:37:42 +0100
- Subject: RE: New rematerialization sub-pass in LRA
- Authentication-results: sourceware.org; auth=none
- References: <5437F4EC dot 2070809 at redhat dot com> <543BC697 dot 4010207 at arm dot com> <000701cfe702$1b770710$52651530$ at com> <543D307C dot 8090703 at redhat dot com>
> Wilco Dijkstra wrote:
> > Vladimir Makarov wrote:
> > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
> > > SPECFP is ~0.2% faster.
> > Thanks for reporting this. It is important for me as I have no aarch64
> > machine for benchmarking.
> >
> > Perlbmk performance degradation is too big and I'll definitely look at
> > this problem.
>
> Looking at the diffs in regexec.c which has the hot function regmatch(),
> nothing obvious stands out that could cause a serious regression.
> I did notice this around line 2300:
>
> .L802:
> ldr x1, [x23, 48]
> adrp x5, PL_savestack_ix
> ldr w0, [x23]
> str x5, [sp, 104]
> str x1, [x24, #:lo12:PL_regcc]
> ldr w27, [x1, 4]
> bl regcppush
> - ldr x5, [sp, 104]
> str w0, [sp, 112]
> ldr x0, [x23, 32]
> + adrp x5, PL_savestack_ix
> ldr w28, [x5, #:lo12:PL_savestack_ix]
> + str x5, [sp, 104]
> bl regmatch
> ldr x5, [sp, 104]
> mov w19, w0
> ldr w1, [sp, 112]
> ldr w0, [x5, #:lo12:PL_savestack_ix]
>
> So it rematerializes once instance, but fails to rematerialize the second use.
> An extra store is inserted, and the first adrp and store are not removed as dead.
A simple example that reproduces the issue (-mcpu=cortex-a57 -O2 -fomit-frame-pointer
-ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30). It looks like an odd
interaction between -fcaller-saves and rematerialization.
void g(void);
int x;
int f3b(int y)
{
y += x;
g();
y += x;
g();
y += x;
return y;
}
f3b:
adrp x2, x --> DEAD
sub sp, sp, #16
ldr w1, [x2, #:lo12:x]
str x2, [sp] --> DEAD
add w0, w0, w1
str w0, [sp] --> reuse of stackslot!!!
bl g
adrp x2, x
ldr w0, [sp]
ldr w1, [x2, #:lo12:x]
str x2, [sp, 8]
add w0, w0, w1
str w0, [sp] --> REMOVE
bl g
ldr x2, [sp, 8] --> rematerialize adrp
ldr w0, [sp]
add sp, sp, 16
ldr w1, [x2, #:lo12:x]
add w0, w0, w1
ret
Wilco