[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
lucier at math dot purdue dot edu
gcc-bugzilla@gcc.gnu.org
Sat Dec 6 16:39:00 GMT 2008
------- Comment #39 from lucier at math dot purdue dot edu 2008-12-06 16:37 -------
I may have narrowed down the problem a bit.
With this compiler (revision 118491):
pythagoras-277% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061105 (experimental)
one gets (on a faster machine than previous reports)
(time (direct-fft-recursive-4 a table))
133 ms real time
140 ms cpu time (140 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
With this compiler (revision 118474):
pythagoras-24% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061104 (experimental)
one gets
(time (direct-fft-recursive-4 a table))
116 ms real time
108 ms cpu time (108 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
and you see the typical problem with assembly code from direct.i with the later
compiler.
Paolo may have been right about fwprop, this patch was installed that day:
Author: bonzini
Date: Sat Nov 4 08:36:45 2006
New Revision: 118475
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118475
Log:
2006-11-03 Paolo Bonzini <bonzini@gnu.org>
Steven Bosscher <stevenb.gcc@gmail.com>
* fwprop.c: New file.
* Makefile.in: Add fwprop.o.
* tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New.
* passes.c (init_optimization_passes): Schedule forward propagation.
* rtlanal.c (loc_mentioned_in_p): Support NULL value of the second
parameter.
* timevar.def (TV_FWPROP): New.
* common.opt (-fforward-propagate): New.
* opts.c (decode_options): Enable forward propagation at -O2.
* gcse.c (one_cprop_pass): Do not run local cprop unless touching
jumps.
* cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr,
canon_for_address, table_size): Remove.
(new_basic_block, insert, remove_from_table): Remove references to
table_size.
(fold_rtx): Process SUBREGs and MEMs with equiv_constant, make
simplification loop more straightforward by not calling fold_rtx
recursively.
(equiv_constant): Move here a small part of fold_rtx_subreg,
do not call fold_rtx. Call avoid_constant_pool_reference
to process MEMs.
* recog.c (canonicalize_change_group): New.
* recog.h (canonicalize_change_group): New.
* doc/invoke.texi (Optimization Options): Document fwprop.
* doc/passes.texi (RTL passes): Document fwprop.
Added:
trunk/gcc/fwprop.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/common.opt
trunk/gcc/cse.c
trunk/gcc/doc/invoke.texi
trunk/gcc/doc/passes.texi
trunk/gcc/gcse.c
trunk/gcc/opts.c
trunk/gcc/passes.c
trunk/gcc/recog.c
trunk/gcc/recog.h
trunk/gcc/rtlanal.c
trunk/gcc/timevar.def
trunk/gcc/tree-pass.h
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
More information about the Gcc-bugs
mailing list