This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

zero extensions for PPro



As I mentioned a short while ago, I have some changes to improve how we
handle zero extensions for the x86 port.

First, when optimizing for size, "movz" wins consistently over masking with
an "and", regardless of the source operand.

PPro/PII (speed):

  movz reg,reg			--> 1 uop  P01
  movz mem,reg			--> 1 uop  P2

  and imm,reg			--> 1 uop  P01
  mov mem,reg;and imm,reg	--> 2 uops P2, P01

  So, movz always generates minimal uops and is smaller.  So we always
  want to use movz on PPro/PII.


For others processors I think our code generation was reasonable, but tended
to generate multiple instructions from within the extension patterns.  Those
cases should be using a splitter.

We want to use a separate pattern for space/PPro opts so that we can have the
constraints precisely match what the pattern supports.


This patch deals with getting the size and PPro/PII optimization cases
correct.  I'll send the splitter changes separately.  Then we need to update
the andXX patterns.

My results with this patch are similar to HJ's.  Compression gets a little
bit faster, decompression is quite a bit faster.




	* i386.md (zero_extendhisi2): Split into an expander and anonymous
	patter.  Add new anonymous pattern for use when optimizing for
	size or for the PPro.
	(zero_extendqihi2, zero_extendqisi2): Likewise.

Index: i386.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/i386/i386.md,v
retrieving revision 1.51
diff -c -3 -p -r1.51 i386.md
*** i386.md	1999/03/08 23:31:28	1.51
--- i386.md	1999/03/09 08:44:50
***************
*** 1789,1798 ****
  ;;- zero extension instructions
  ;; See comments by `andsi' for when andl is faster than movzx.
  
! (define_insn "zero_extendhisi2"
    [(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
  	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
!   ""
    "*
    {
    rtx xops[2];
--- 1789,1813 ----
  ;;- zero extension instructions
  ;; See comments by `andsi' for when andl is faster than movzx.
  
! (define_expand "zero_extendhisi2"
!   [(set (match_operand:SI 0 "register_operand" "")
! 	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
!   ""
!   "")
! 
! ;; When optimizing for the PPro/PII or code size, always use movzwl.
! ;; We want to use a different pattern so we can use different constraints
! ;; than the generic pattern.
! (define_insn ""
!   [(set (match_operand:SI 0 "register_operand" "=r")
! 	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "rm")))]
!   "(optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
!   "* return AS2 (movz%W0%L0,%1,%0);")
! 
! (define_insn ""
    [(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
  	(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
!   "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
    "*
    {
    rtx xops[2];
***************
*** 1852,1862 ****
         (and:SI (match_dup 0)
  	       (const_int 65535)))]
    "operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
  
! (define_insn "zero_extendqihi2"
    [(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
  	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
!   ""
    "*
    {
    rtx xops[2];
--- 1867,1890 ----
         (and:SI (match_dup 0)
  	       (const_int 65535)))]
    "operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
+ 
+ (define_expand "zero_extendqihi2"
+   [(set (match_operand:HI 0 "register_operand" "")
+ 	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "")))]
+   ""
+   "")
+ 
+ (define_insn ""
+   [(set (match_operand:HI 0 "register_operand" "=r")
+ 	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
+   "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
+ 
+   "*  return AS2 (movz%B0%W0,%1,%0);")
  
! (define_insn ""
    [(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
  	(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
!   "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
    "*
    {
    rtx xops[2];
***************
*** 1934,1943 ****
      FAIL;
    operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
  
! (define_insn "zero_extendqisi2"
    [(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
  	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
!   ""
    "*
    {
    rtx xops[2];
--- 1962,1983 ----
      FAIL;
    operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
  
! (define_expand "zero_extendqisi2"
!   [(set (match_operand:SI 0 "register_operand" "")
! 	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))]
!   ""
!   "")
! 
! (define_insn ""
!   [(set (match_operand:SI 0 "register_operand" "=r")
! 	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
!   "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
!   "* return AS2 (movz%B0%L0,%1,%0);")
! 
! (define_insn ""
    [(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
  	(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
!   "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
    "*
    {
    rtx xops[2];







Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]