This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: This code runs *very* slowly
- To: Rahul Siddharthan <rsidd at physics dot iisc dot ernet dot in>, egcs-bugs at cygnus dot com
- Subject: Re: This code runs *very* slowly
- From: Richard Henderson <rth at cygnus dot com>
- Date: Thu, 5 Nov 1998 03:28:12 -0800
- Cc: egcs-patches at cygnus dot com
- References: <Pine.LNX.4.05.9810281340200.24889-100000@sys3.physics.iisc.ernet.in>
- Reply-To: Richard Henderson <rth at cygnus dot com>
On Mon, Nov 02, 1998 at 01:07:20PM +0530, Rahul Siddharthan wrote:
> We have four Digital Alpha workstations, two of which (333 Mhz)
> run Digital Unix 4.0 with Digital's c compiler, and two (433 MHz)
> run Red Hat Linux 5.0 with egcs 1.1b. I tried compiling on both,
> just to compare the speed. When I have only near-neighbour
> interactions, I found that the Digital C code runs around 10%-20%
> faster than the egcs code -- despite the machine clock speed
> being around 25% less. This speed difference was acceptable to me
> since I dont expect gcc to be as fast as a compiler specifically
> optimized for the Alpha architecture.
>
> When I tried including the next-neighbour interactions through
> #defines, the Digital C code slowed a bit (as expected), but the
> egcs code slowed an enormous lot: it now runs around a factor of
> 4 slower than the Digital code. This is way unacceptable: even
> if there are three other jobs running on the Digital Unix machine
> it still runs faster than on an otherwise idle Linux machine.
You've pointed out a relatively serious bit of losage in the
common subexpression eliminiation pass. You've got hordes of
identical expressions that were not identified as such.
It turns out that a bit of reformulation in the back end can
get the job done without having to uglify CSE any more than
it already is.
I was not patient enough to let your test case run to completion,
so I don't know how runtime is affected, but the size of the
code produced is reduced from 55558 to 29901 bytes, which under
no circumstances could be bad. The static number of modulus
operations is reduced from 889 to 111; at ~ 100 cycles apiece,
that's got to be no small thing.
Unfortunately, the patch doesn't really do anything for the
compilation time. For the record, the top three hogs for this
example are global-alloc, loop, and cse1 at 30%, 23% and 14%
respectively.
Please let me know how this works for you.
r~
* alpha.md (addsi3, subsi3): Expand to a temporary in DImode to
expose this mid-point to CSE.
Index: config/alpha/alpha.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/alpha/alpha.md,v
retrieving revision 1.55
diff -c -p -d -r1.55 alpha.md
*** alpha.md 1998/09/19 12:14:35 1.55
--- alpha.md 1998/11/05 11:04:44
***************
*** 426,435 ****
(match_operand:SI 2 "add_operand" "")))]
""
"
! { emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (DImode, operands[0]),
! gen_rtx_PLUS (DImode,
! gen_lowpart (DImode, operands[1]),
! gen_lowpart (DImode, operands[2]))));
DONE;
} ")
--- 426,436 ----
(match_operand:SI 2 "add_operand" "")))]
""
"
! {
! rtx tmp = gen_reg_rtx (DImode);
! emit_insn (gen_adddi3 (tmp, gen_lowpart (DImode, operands[1]),
! gen_lowpart (DImode, operands[2])));
! emit_move_insn (operands[0], gen_lowpart (SImode, tmp));
DONE;
} ")
***************
*** 712,721 ****
(match_operand:SI 2 "reg_or_8bit_operand" "")))]
""
"
! { emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (DImode, operands[0]),
! gen_rtx_MINUS (DImode,
! gen_lowpart (DImode, operands[1]),
! gen_lowpart (DImode, operands[2]))));
DONE;
} ")
--- 713,723 ----
(match_operand:SI 2 "reg_or_8bit_operand" "")))]
""
"
! {
! rtx tmp = gen_reg_rtx (DImode);
! emit_insn (gen_subdi3 (tmp, gen_lowpart (DImode, operands[1]),
! gen_lowpart (DImode, operands[2])));
! emit_move_insn (operands[0], gen_lowpart (SImode, tmp));
DONE;
} ")