[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code

lucier at math dot purdue dot edu gcc-bugzilla@gcc.gnu.org
Sat Dec 6 16:39:00 GMT 2008



------- Comment #39 from lucier at math dot purdue dot edu  2008-12-06 16:37 -------
I may have narrowed down the problem a bit.

With this compiler (revision 118491):

pythagoras-277% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061105 (experimental)

one gets (on a faster machine than previous reports)

(time (direct-fft-recursive-4 a table))
    133 ms real time
    140 ms cpu time (140 user, 0 system)
    no collections
    64 bytes allocated
    no minor faults
    no major faults

With this compiler (revision 118474):

pythagoras-24% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061104 (experimental)

one gets

(time (direct-fft-recursive-4 a table))
    116 ms real time
    108 ms cpu time (108 user, 0 system)
    no collections
    64 bytes allocated
    no minor faults
    no major faults

and you see the typical problem with assembly code from direct.i with the later
compiler.

Paolo may have been right about fwprop, this patch was installed that day:

Author: bonzini
Date: Sat Nov  4 08:36:45 2006
New Revision: 118475

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118475
Log:
2006-11-03  Paolo Bonzini  <bonzini@gnu.org>
            Steven Bosscher  <stevenb.gcc@gmail.com>

        * fwprop.c: New file.
        * Makefile.in: Add fwprop.o.
        * tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New.
        * passes.c (init_optimization_passes): Schedule forward propagation.
        * rtlanal.c (loc_mentioned_in_p): Support NULL value of the second
        parameter.
        * timevar.def (TV_FWPROP): New.
        * common.opt (-fforward-propagate): New.
        * opts.c (decode_options): Enable forward propagation at -O2.
        * gcse.c (one_cprop_pass): Do not run local cprop unless touching
jumps.
        * cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr,
        canon_for_address, table_size): Remove.
        (new_basic_block, insert, remove_from_table): Remove references to
        table_size.
        (fold_rtx): Process SUBREGs and MEMs with equiv_constant, make
        simplification loop more straightforward by not calling fold_rtx
        recursively.
        (equiv_constant): Move here a small part of fold_rtx_subreg,
        do not call fold_rtx.  Call avoid_constant_pool_reference
        to process MEMs.
        * recog.c (canonicalize_change_group): New.
        * recog.h (canonicalize_change_group): New.

        * doc/invoke.texi (Optimization Options): Document fwprop.
        * doc/passes.texi (RTL passes): Document fwprop.


Added:
    trunk/gcc/fwprop.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/common.opt
    trunk/gcc/cse.c
    trunk/gcc/doc/invoke.texi
    trunk/gcc/doc/passes.texi
    trunk/gcc/gcse.c
    trunk/gcc/opts.c
    trunk/gcc/passes.c
    trunk/gcc/recog.c
    trunk/gcc/recog.h
    trunk/gcc/rtlanal.c
    trunk/gcc/timevar.def
    trunk/gcc/tree-pass.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



More information about the Gcc-bugs mailing list