[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

Wed Jun 20 09:28:00 GMT 2012

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2012-06-20
          Component|c                           |tree-optimization
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1
   Target Milestone|---                         |4.8.0

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-20 09:27:52 UTC ---
You mean the fix lead to recognition of memcpy?  At least I see memcpy
calls in the bad assembly.

There is always a cost consideration for memcpy - does performance recover
with -minline-all-stringops?  I suppose BC is actually very small?

The testcase does not include a runtime part so I can't check myself.

Definitely a byte-wise copy loop as in the .good assembly variant,

 .L5:
-       .loc 1 14 0 is_stmt 1 discriminator 2
-       movzbl  16(%esp,%eax), %edx
-       movb    %dl, (%esi,%eax)
-       leal    1(%eax), %eax
-.LVL5:
-       cmpl    %ebx, %eax
-       jl      .L5

does not look good - even a rep movb should be faster, no?