This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/57534] New: Performance regression versus 4.7.3, 4.8.1 is ~15% slower


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57534

            Bug ID: 57534
           Summary: Performance regression versus 4.7.3, 4.8.1 is ~15%
                    slower
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ncahill_alt at yahoo dot com

Created attachment 30261
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30261&action=edit
Reduced source code - timing functions

With x86 GCC 4.8.1, slower code is produced (than with 4.7.3) for a particular
benchmark I ran, about 15% slower.

Whatever is wrong must be happening here:

 80486e5:       d9 ee                   fldz   
 80486e7:       d9 c0                   fld    %st(0)
 80486e9:       8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
 80486f0:       8d 04 f5 00 00 00 00    lea    0x0(,%esi,8),%eax
 80486f7:       dd 04 f3                fldl   (%ebx,%esi,8)
 80486fa:       dc 44 03 08             faddl  0x8(%ebx,%eax,1)
 80486fe:       dc 44 03 10             faddl  0x10(%ebx,%eax,1)
 8048702:       dc 44 03 18             faddl  0x18(%ebx,%eax,1)
 8048706:       de c2                   faddp  %st,%st(2)
 8048708:       dd 44 03 20             fldl   0x20(%ebx,%eax,1)
 804870c:       dc 44 03 28             faddl  0x28(%ebx,%eax,1)
 8048710:       dc 44 03 30             faddl  0x30(%ebx,%eax,1)
 8048714:       dc 44 03 38             faddl  0x38(%ebx,%eax,1)
 8048718:       8d 46 08                lea    0x8(%esi),%eax
 804871b:       39 c7                   cmp    %eax,%edi
 804871d:       de c1                   faddp  %st,%st(1)
 804871f:       7f 0e                   jg     804872f 
 8048721:       a1 34 91 04 08          mov    0x8049134,%eax
 8048726:       85 c0                   test   %eax,%eax
 8048728:       74 0e                   je     8048738 
 804872a:       83 c5 01                add    $0x1,%ebp
 804872d:       31 c0                   xor    %eax,%eax
 804872f:       89 c6                   mov    %eax,%esi
 8048731:       eb bd                   jmp    80486f0 
 8048733:       90                      nop
 8048734:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
 8048738:       dd 5c 24 10             fstpl  0x10(%esp)
 804873c:       83 c6 10                add    $0x10,%esi
 804873f:       dd 5c 24 08             fstpl  0x8(%esp)

This is the commandline: gcc -O2 reduceme.c timer.o -o cachebench

This is from a benchmark (llcbench, GPL software) and uses timers which may be
a problem, if I preprocess them, they may not work.  I'll attach the main code
(reduced) for now, and I'll work on getting the timing code included very soon.
 I'll also test with 4.8.0 to see whether that version is also affected.

Attached is the reduced code minus the timing functions.  Uncommenting the
commented line in the source code removes the bug.

Thanks.
Neil.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]