This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/59967] New: Performance regression from 4.7.x to 4.8.x (loop not unrolled)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59967

            Bug ID: 59967
           Summary: Performance regression from 4.7.x to 4.8.x (loop not
                    unrolled)
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chbreitkopf at gmail dot com

Created attachment 31967
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31967&action=edit
preprocessed source of ray/src/rt/ambient.c

gcc 4.8.x generates 10-15% slower code compared to 4.7.x for Mark Stock's
radiance benchmark (http://markjstock.org/pages/rad_bench.html).

I observed this regression on Linux x86_64, and with different CPUs (Ivy
Bridge, Haswell, AMD Phenom, Kaveri). I had suspected the new register
allocator, but the actual cause is a difference in loop unrolling.

The hotspot is the nested loops with the recursive call at the end of the
sumambient() function. When using -Ofast, gcc 4.7.x will unroll the outer loop,
which results in some optimization possibilities in the inner loop. gcc 4.8.x
does not unroll the outer loop. -funroll-loops does not change the behavior.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]