This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/65135] [5 Regression] Performance regression in pic mode after r220674.
- From: "vmakarov at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 06 Mar 2015 19:43:46 +0000
- Subject: [Bug rtl-optimization/65135] [5 Regression] Performance regression in pic mode after r220674.
- Auto-submitted: auto-generated
- References: <bug-65135-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135
Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu.org
--- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
There is nothing can be done here. RA is all about heuristics not
about optimal solutions.
In this case we have
movl %esi, 4(%esp) # 251 *movsi_internal/2 [length = 4]
movl %eax, 20(%esp) # 276 *movsi_internal/2 [length = 4]
movl R@GOTOFF(%ebp), %eax # 51 *movsi_internal/1 [length
= 6]
movl %eax, 24(%esp) # 277 *movsi_internal/2 [length = 4]
.p2align 4,,10
.p2align 3
.L6:
movl 4(%esp), %ebx # 367 *movsi_internal/1 [length = 4]
testl %ebx, %ebx # 368 *cmpsi_ccno_1/1 [length = 2]
jne .L9 # 75 *jcc_1 [length = 6]
movl 20(%esp), %eax # 280 *movsi_internal/1 [length = 4]
movl (%eax,%edx), %eax # 77 *movsi_internal/1 [length
= 3]
cmpl $-1, %eax # 85 *cmpsi_1/1 [length = 3]
je .L11 # 86 *jcc_1 [length = 6]
.L42:
movl 8(%esp), %edi # 282 *movsi_internal/1 [length = 4]
leal 0(,%eax,4), %edx # 317 *leasi [length = 7]
movl %ecx, %ebx # 90 *movsi_internal/1 [length = 2]
addl %edx, %edi # 89 *addsi_1/1 [length = 2]
cmpl $101, %ecx # 92 *cmpsi_1/1 [length = 3]
je .L40 # 93 *jcc_1 [length = 6]
movl 12(%esp), %esi # 278 *movsi_internal/1 [length = 4]
cmpl (%edi), %esi # 56 *cmpsi_1/2 [length = 2]
leal 1(%ebx), %ecx # 318 *leasi [length = 3]
jne .L6 # 57 *jcc_1 [length = 2]
Insn #251 is created by IRA for live range splitting of p126 around
loop. The live range in the loop uses p154. So after IRA we have
p126 and p154 assigned to SI. Then in live range of p154, LRA
generates reload for insn 56:
Creating newreg=171 from oldreg=92, assigning class GENERAL_REGS to r171
56: flags:CCZ=cmp(r171:SI,[r158:SI])
Inserting insn reload before:
278: r171:SI=r92:SI
and trying to assign a hard reg to p171:
Assigning to 171 (cl=GENERAL_REGS, orig=92, freq=1314, tfirst=171,
tfreq=1314)...
Trying 0: spill 157(freq=2604) Now best 0(cost=1290)
Trying 1: spill 155(freq=2606)
Trying 2: spill 153(freq=2156) Now best 2(cost=842)
Trying 3: spill 156(freq=2320)
Trying 4: spill 154(freq=1584) Now best 4(cost=270)
Trying 5: spill 158(freq=1381)
Spill r154(hr=4, freq=1584) for r171
So LRA chooses to spill p154 as the cheapest one. Skipping some tried
transformations (as optional reload and inheritance), we have at the
end of LRA:
74: flags:CCZ=cmp([sp:SI+0x4],0)
which is transformed in peephole2 pass into:
367: bx:SI=[sp:SI+0x4]
368: flags:CCZ=cmp(bx:SI,0)
75: pc={(flags:CCZ!=0)?L80:pc}
We cannot reuse si for insn 368 as it is also used in insn #56 (this
is reload I mentioned above). So we can not just change [sp:SI+0x4]
to SI because its value can be corrupted after the 1st loop iteration.
The change in a previous part of compiler resulted in different code
before RA and we have a different code after RA.
So this bug will not fixed at least by changes in RA.