This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: This code runs *very* slowly


On Mon, Nov 02, 1998 at 01:07:20PM +0530, Rahul Siddharthan wrote:
> We have four Digital Alpha workstations, two of which (333 Mhz)
> run Digital Unix 4.0 with Digital's c compiler, and two (433 MHz)
> run Red Hat Linux 5.0 with egcs 1.1b. I tried compiling on both,
> just to compare the speed. When I have only near-neighbour
> interactions, I found that the Digital C code runs around 10%-20%
> faster than the egcs code -- despite the machine clock speed
> being around 25% less. This speed difference was acceptable to me
> since I dont expect gcc to be as fast as a compiler specifically
> optimized for the Alpha architecture.
> 
> When I tried including the next-neighbour interactions through
> #defines, the Digital C code slowed a bit (as expected), but the
> egcs code slowed an enormous lot: it now runs around a factor of
> 4 slower than the Digital code. This is way unacceptable: even
> if there are three other jobs running on the Digital Unix machine
> it still runs faster than on an otherwise idle Linux machine.

You've pointed out a relatively serious bit of losage in the 
common subexpression eliminiation pass.  You've got hordes of
identical expressions that were not identified as such. 

It turns out that a bit of reformulation in the back end can
get the job done without having to uglify CSE any more than
it already is.

I was not patient enough to let your test case run to completion,
so I don't know how runtime is affected, but the size of the
code produced is reduced from 55558 to 29901 bytes, which under
no circumstances could be bad.  The static number of modulus
operations is reduced from 889 to 111; at ~ 100 cycles apiece,
that's got to be no small thing.

Unfortunately, the patch doesn't really do anything for the 
compilation time.  For the record, the top three hogs for this
example are global-alloc, loop, and cse1 at 30%, 23% and 14%
respectively.

Please let me know how this works for you.


r~


	* alpha.md (addsi3, subsi3): Expand to a temporary in DImode to 
	expose this mid-point to CSE.


Index: config/alpha/alpha.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/alpha/alpha.md,v
retrieving revision 1.55
diff -c -p -d -r1.55 alpha.md
*** alpha.md	1998/09/19 12:14:35	1.55
--- alpha.md	1998/11/05 11:04:44
***************
*** 426,435 ****
  		 (match_operand:SI 2 "add_operand" "")))]
    ""
    "
! { emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (DImode, operands[0]),
! 			  gen_rtx_PLUS (DImode,
! 					gen_lowpart (DImode, operands[1]),
! 					gen_lowpart (DImode, operands[2]))));
    DONE;
  } ")
  
--- 426,436 ----
  		 (match_operand:SI 2 "add_operand" "")))]
    ""
    "
! {
!   rtx tmp = gen_reg_rtx (DImode);
!   emit_insn (gen_adddi3 (tmp, gen_lowpart (DImode, operands[1]),
! 			 gen_lowpart (DImode, operands[2])));
!   emit_move_insn (operands[0], gen_lowpart (SImode, tmp));
    DONE;
  } ")
  
***************
*** 712,721 ****
  		  (match_operand:SI 2 "reg_or_8bit_operand" "")))]
    ""
    "
! { emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (DImode, operands[0]),
! 			  gen_rtx_MINUS (DImode,
! 					 gen_lowpart (DImode, operands[1]),
! 					 gen_lowpart (DImode, operands[2]))));
    DONE;
  } ")
  
--- 713,723 ----
  		  (match_operand:SI 2 "reg_or_8bit_operand" "")))]
    ""
    "
! {
!   rtx tmp = gen_reg_rtx (DImode);
!   emit_insn (gen_subdi3 (tmp, gen_lowpart (DImode, operands[1]),
! 			 gen_lowpart (DImode, operands[2])));
!   emit_move_insn (operands[0], gen_lowpart (SImode, tmp));
    DONE;
  } ")
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]