This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
- From: "steven at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 13 Nov 2012 23:37:52 +0000
- Subject: [Bug middle-end/52285] [4.7/4.8 Regression] libgcrypt _gcry_burn_stack slowdown
- Auto-submitted: auto-generated
- References: <bug-52285-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285
--- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2012-11-13 23:37:52 UTC ---
Created attachment 28678
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28678
Gross hack
(In reply to comment #11)
> If loops are still around at LRA time, perhaps LRA should consider putting
> it before loop if register pressure is low, or LIM could just have extra
> code for this
Unfortunately, loop are destroyed _just_ before LRA, at the end of IRA.
IRA has its own loop tree but that is destroyed before LRA, too.
> I'm not saying it must be LIM, I'm
> just looking for suggestions where to perform this.
LIM may be too early. I've experimented with the attached patch (based off
some other patch for invariant addresses that was bit-rotting on a shelf)
and I had to resort to some crude hacks to make loop-invariant even just
consider moving the bare frame_pointer_rtx, like manually setting the cost
to something high because set_src_cost(frame_pointer_rtx)==0. The result
is this code:
foo:
leaq -72(%rsp), %rcx
leaq -8(%rsp), %rdx // A Pyrrhic victory...
.p2align 4,,10
.p2align 3
.L5:
movq %rcx, %rax
.p2align 4,,10
.p2align 3
.L3:
movb $0, (%rax)
addq $1, %rax
cmpq %rdx, %rax
jne .L3
subl $64, %edi
testl %edi, %edi
jg .L5
rep ret
Need to think about this a bit more, perhaps postreload-gcse can be used
for this instead of LIM...