This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 20 Jun 2012 09:27:52 +0000
- Subject: [Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
- Auto-submitted: auto-generated
- References: <bug-53726-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2012-06-20
Component|c |tree-optimization
CC| |rguenth at gcc dot gnu.org
Ever Confirmed|0 |1
Target Milestone|--- |4.8.0
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-20 09:27:52 UTC ---
You mean the fix lead to recognition of memcpy? At least I see memcpy
calls in the bad assembly.
There is always a cost consideration for memcpy - does performance recover
with -minline-all-stringops? I suppose BC is actually very small?
The testcase does not include a runtime part so I can't check myself.
Definitely a byte-wise copy loop as in the .good assembly variant,
.L5:
- .loc 1 14 0 is_stmt 1 discriminator 2
- movzbl 16(%esp,%eax), %edx
- movb %dl, (%esi,%eax)
- leal 1(%eax), %eax
-.LVL5:
- cmpl %ebx, %eax
- jl .L5
does not look good - even a rep movb should be faster, no?