This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

patch - various i386.md pentium optimizations


Hi
I've cleaned up my patch to i386 and removed all changes wich caused mostly
performance loss. Now it seems to bring small performance improvements
in all my tests (15% in main loop of XaoS) and I am not aware of any problems.
So sending new version of patch for discussion and possible inclusion to egcs.
 
Changes I've made:
 o Split trivial and most common case of movdi patterns
   (it also lets compiler clear register using xor etc.)
 o Added define_split for extendsidi pattern. It now uses combination of
   insn (ctld) and split. Split works just in case, that there is other
   register than eax after reload.
   Also extendsidi works at memory operands to let gcc zero extend value
   directly in stack (probably still faster than swap out registers, swap in
   value, extend value, swap out it and swap in registers)
   Also supports extending from one register/memory to another to avoid
   gcc moving extended value from register to register.
   Same for zero_extendsidi
 o byte division insn now uses truncate and extend, since otherwise is IMO
   (and according to info file) incorrect insn. It still don't match prototype
   (it should need define_expand and extend first operand too), but I don't
   know how to force gcc to generate it. If it is impossible it should be best
   to remove it.
 o I've added new function unit FPMUL to describe gcc, that fpmul instruction
   at pentium are not complettely pipelined and that it is good idea to put
   other instruction between them. It seems to work and brings 5-10% speedup
   at my Mandelbrot loop in XaoS
 o TEST imm,reg instruction is pairable at Pentium just with EAX parameter (don't ask
   me why), so I've changed constraints to preffer eax and set flags propertly
 o Change non pairable NOT to XOR
 o FPDIV instruction takes 38 cycles at pentium, 2 cycles should overlap with
   other FP instruction and rest with integer code, so I've described it to
   scheduler
 o I've started work at clasifying instructions for U and V pipelines. It is
   just approximation now. Also sets attribute prefix, 
   if instruction has 32bit->16bit prefix. This should be usefull for other
   processors too.
   I am not sure, how to describe behaviour of some patterns. For example
   addhi3 has very strange behaviour and I don't know, if it is possible to
   describe, when pairable instruction will be generated and when not.
 o I've made an attempt to specify behaviour of pentium pipelines in greater
   detail, so it now less optimistic about them and don't try to pair imuls,
   divs and other similar instructions. It reduces register lifetimes so it
   should help a bit (0-20% in my tests).
   To describe it, I say, that some instructions uses multiple units (non
   pairable instructions uses both). It seems to work with HAIFA, but I am not
   sure, if it is correct

Thinks I am currently working at (not included in this patch), but I can't
figure out correct solution. I would like if someone should point me to
correct right way.

 o To split di-mode aritmetic I should need to write patterns for adc
   shld and such instructions. How to do this?
 o split divmod patterns to make sign extension separately. 
   There are two choices:
   1) use new temporary sign extended register. Problem is that global.c
      for some purpose don't put this new register to eax
   2) use hard register eax. Problem is that gcc then don't combine
      a/b and a%b.
   (any better way?)
   My current implementation is:

(define_expand "udivmodsi4"
  [(set (match_dup 4)
        (zero_extend:DI (match_operand:SI 1 "register_operand" "0")))
   (parallel [
   (set (match_operand:SI 0 "register_operand" "=a")
	(truncate:SI (udiv:DI (match_dup 4)
	             (zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" "rm")))))
   (set (match_operand:SI 3 "register_operand" "=d")
	(truncate:SI (umod:DI (match_dup 4) (zero_extend:DI (match_dup 2)))))])]
   
   ""
   "operands[4] = gen_reg_rtx (DImode);")

(define_insn ""
  [(set (match_operand:SI 0 "register_operand" "=a")
	(truncate:SI (udiv:DI (match_operand:DI 1 "register_operand" "A")
	                      (zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" "rm")))))
   (set (match_operand:SI 3 "register_operand" "=d")
	(truncate:SI (umod:DI (match_dup 1) (zero_extend:DI (match_dup 2)))))]
  ""
  "div%L0 %2"
  [(set_attr "type" "idiv")])

 o split NEG, to XOR and DEC
   by define_peephole recombine it again in case nothing was put between this
   instructions, since it is actually performance loss in case scheduler
   failed to pair it.
   Compiler don't bootstrap with it. Why? My implementation is:
 ;; At pentium it should be better to generate xor/inc instructions instead of
 ;; neg, because they can be paired. In case scheduler fails to pair them,
 ;; it is performance loss, so use define_peephole to recombine it.
 
 (define_split
   [(set (match_operand:SI 0 "register_operand" "")
 	(neg:SI (match_operand:SI 1 "general_operand" "")))]
   "ix86_cpu == PROCESSOR_PENTIUM"
   [(set (match_dup 0)
         (match_dup 1))
    (set (match_dup 0)
         (not:SI (match_dup 0)))
    (set (match_dup 0)
         (plus:SI (match_dup 0) (const_int 1)))]
   "")
 
 (define_peephole
   [(set (match_operand:SI 0 "general_operand" "=r")
         (not:SI (match_dup 0)))
    (set (match_dup 0)
         (plus:SI (match_dup 0) (const_int 1)))]
   ""
   "neg%L0 %0")
 
 o Extendsidi pattern don't do anything special, so it should be IMO ommited
   to enable gcc's default version.

   There is problem with life analysis. GCC add clober before, wich
   generates unnecesary collision between source and target. Possibly GCC's
   default version should be modified to handle this in better way.
   I've changed optabs to generate REG_NO_CONFLICT for clobber and final move.

   Problem is, that code for handling REG_NO_CONFLICT is disabled in global.c,
   because it can't catch partial conflicts. So before this gets fixed, this
   is probably not the way to go.
   I would possibly try to change global.c to handle this case correctly,
   if someone don't plans some larger changes to global.c
   Is it good idea? The change to global.c seems to be quite trivial to
   me. Is there any problem I am not aware of?
 o After looking at sparc patches it don't seems to be too hard for me to
   write MD_SCHED macros for Pentium (because Pentium is probably only
   procesor from x86 family that really needs scheduling and don't do it
   itself, it is probably only CPU, where it is worthwhile)
   But I need to make attributes as exact as possible. Can someone help
   me for example how to recognize non pairable immediate/displacement
   instructions? (I.E how to get from operand the fact, that it uses
   displacement)

 
--- gcc/ChangeLog.orig	Wed Sep 23 21:54:36 1998
+++ gcc/ChangeLog	Thu Sep 24 00:00:23 1998
@@ -1,3 +1,20 @@
+Wed Sep 23 03:05:18 1998  Jan Hubicka <hubicka@freesoft.cz>
+
+	* i386.md: Various small Pentium optimizations, scheduling improvements
+	Add prefix and pipe flags to describe Pentium pipes
+	(pentium scheduling parameters): change parameters for fpdiv,
+	new function unit fpmul to describe non-pipelined part of fpmul
+	new function units upipe and vpipe, approximate real behaviour
+	of them
+	(test patterns): prefer pairable eax version
+	(movsi): split the trivial (non-overlapping) case
+        (zero_extendsidi2): split it, support extending from other
+	register and to memory to avoid unnecesary moves of extended value
+	(extendsidi2): likewise, don't generate ctld at pentium
+	(divqi3): use truncate/sign_extend to make it correct expression
+	(xor patterns): do not generate non-pairable not at Pentium
+	(not patterns): use pairable xor at Pentium
+
 Mon Sep 14 14:02:53 PDT 1998 Jeff Law  (law@cygnus.com)
 
 	* version.c: Bump for snapshot.
--- gcc/config/i386/i386.md.orig	Wed Sep 16 17:36:37 1998
+++ gcc/config/i386/i386.md	Wed Sep 23 23:59:50 1998
@@ -75,6 +75,22 @@
   "integer,binary,memory,test,compare,fcompare,idiv,imul,lea,fld,fpop,fpdiv,fpmul"
   (const_string "integer"))
 
+;; true if instruction have 32bit to 16bit switching prefix
+;; it is _very_ rough approximation of real situation, because many
+;; instruction patterns generates many different insturctions, and I
+;; don't know how to write it more exactly. Someone should look at HI mode
+;; patterns and improve this.
+(define_attr "prefix"
+  "true,false"
+  (const_string "false"))
+
+;; pipelines used by pentium. FX is for floating point instructions, that
+;; pairs with fxch
+;; it is just aproximation for exactly same purposes as "prefix" attribute
+(define_attr "pipes"
+  "none,u,v,uv,fx"
+  (const_string "none"))
+
 (define_attr "memory" "none,load,store"
   (cond [(eq_attr "type" "idiv,lea")
 	 (const_string "none")
@@ -133,9 +149,26 @@
  (and (eq_attr "type" "fpop,fcompare") (eq_attr "cpu" "pentium,pentiumpro")) 
  3 0)
 
+;; Most FP instructions are decoded in u pipe
+(define_function_unit "upipe" 1 0
+ (and (eq_attr "type" "fpop,fcompare,fld,fpmul,fpdiv") (eq_attr "cpu" "pentium")) 
+ 1 0) 
+
+;; But some blocks vpipe too
+(define_function_unit "vpipe" 1 0
+ (and (and (eq_attr "type" "fpop,fcompare,fld,fpmul,fpdiv") (eq_attr "cpu" "pentium")) 
+      (eq_attr "pipes" "!fx"))
+ 1 0) 
+
 (define_function_unit "fp" 1 0
  (and (eq_attr "type" "fpmul") (eq_attr "cpu" "pentium")) 
- 7 0)
+ 7 0) 
+;; It is recomended to put one fp instruction between two fmuls,
+;; since unit is not completely pipelined
+(define_function_unit "fpmul" 1 1
+ (and (eq_attr "type" "fpmul") (eq_attr "cpu" "pentium")) 
+ 2 2) 
+
 
 (define_function_unit "fp" 1 0
  (and (eq_attr "type" "fpmul") (eq_attr "cpu" "pentiumpro")) 
@@ -150,9 +183,18 @@
  6 0)
 
 (define_function_unit "fp" 1 0
- (eq_attr "type" "fpdiv") 
+ (and (eq_attr "type" "fpdiv") 
+ (eq_attr "cpu" "!pentium"))
  10 10)
 
+;; fpdiv takes 38 cycles. 2 cycles should be used for fp instructions and
+;; rest for integer ones.
+(define_function_unit "fp" 1 0
+ (and (eq_attr "type" "fpdiv") 
+ (eq_attr "cpu" "pentium"))
+ 38 36)
+
+
 (define_function_unit "fp" 1 0
   (and (eq_attr "type" "fld") (eq_attr "cpu" "!pentiumpro,k6"))
  1 0)
@@ -165,9 +207,49 @@
 ;; i386 and i486 have one integer unit, which need not be modeled
 
 (define_function_unit "integer" 2 0
-  (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium,pentiumpro"))
+  (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentiumpro"))
  1 0)
 
+;; Pentium has u and v pipelines. They works in very strange way, so this is
+;; just approximation
+
+;; u-only and non pairable instructions uses u pipe
+(define_function_unit "upipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "u,none") (eq_attr "prefix" "false")))
+ 1 0)
+
+;; v-only and non pairable instructions uses v pipe
+(define_function_unit "vpipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "v,none") (eq_attr "prefix" "false")))
+ 1 0)
+
+;; prefixed u instruction are pairable with another u instruction after opcode
+;; is decoded
+(define_function_unit "upipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "u,uv") (eq_attr "prefix" "true")))
+ 1 1) ;; one extra cycle
+
+;; non pairable and v pairable instructions with prefixes don't pair at all
+(define_function_unit "upipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "v,none") (eq_attr "prefix" "true")))
+ 2 2) ;; one extra cycle
+
+;; prefixed opcodes are not pairable with v instructions
+(define_function_unit "vpipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "u") (eq_attr "prefix" "true")))
+ 2 2) ;; one extra cycle
+
+; uv instruction takes one extra cycle to avoid dependencies in uv only code
+(define_function_unit "vpipe" 1 0
+  (and (and (eq_attr "type" "integer,binary,test,compare,lea") (eq_attr "cpu" "pentium"))
+       (and (eq_attr "pipes" "uv") (eq_attr "prefix" "false")))
+ 2 0)
+
 (define_function_unit "integer" 2 0
   (and (eq_attr "cpu" "k6")
        (and (eq_attr "type" "integer,binary,test,compare")
@@ -182,19 +264,36 @@
 	    (eq_attr "memory" "load")))
   3 0)
 
-;; Multiplies use one of the integer units
-(define_function_unit "integer" 2 0
+(define_function_unit "upipe" 1 0
   (and (eq_attr "cpu" "pentium") (eq_attr "type" "imul"))
   11 11)
 
-(define_function_unit "integer" 2 0
-  (and (eq_attr "cpu" "k6") (eq_attr "type" "imul"))
-  2 2)
+(define_function_unit "vpipe" 1 0
+  (and (eq_attr "cpu" "pentium") (eq_attr "type" "imul"))
+  11 11)
 
-(define_function_unit "integer" 2 0
+;; Even fp unit is blocked
+(define_function_unit "fp" 1 0
+  (and (eq_attr "cpu" "pentium") (eq_attr "type" "imul"))
+  11 11)
+
+
+(define_function_unit "upipe" 1 0
   (and (eq_attr "cpu" "pentium") (eq_attr "type" "idiv"))
   25 25)
 
+(define_function_unit "vpipe" 1 0
+  (and (eq_attr "cpu" "pentium") (eq_attr "type" "idiv"))
+  25 25)
+
+(define_function_unit "fp" 1 0
+  (and (eq_attr "cpu" "pentium") (eq_attr "type" "idiv"))
+  25 25)
+
+(define_function_unit "integer" 2 0
+  (and (eq_attr "cpu" "k6") (eq_attr "type" "imul"))
+  2 2)
+
 (define_function_unit "integer" 2 0
   (and (eq_attr "cpu" "k6") (eq_attr "type" "idiv"))
   17 17)
@@ -217,7 +316,6 @@
 (define_function_unit "store" 1 0
   (and (eq_attr "cpu" "k6") (eq_attr "type" "lea"))
   1 0)
-
 
 ;; "movl MEM,REG / testl REG,REG" is faster on a 486 than "cmpl $0,MEM".
 ;; But restricting MEM here would mean that gcc could not remove a redundant
@@ -243,13 +341,14 @@
   ""
   "*
 {
-  if (REG_P (operands[0]))
+  if (REG_P(operands[0]))
     return AS2 (test%L0,%0,%0);
 
   operands[1] = const0_rtx;
   return AS2 (cmp%L0,%1,%0);
 }"
-  [(set_attr "type" "test")])
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "test")])
 
 (define_expand "tstsi"
   [(set (cc0)
@@ -269,13 +368,15 @@
   ""
   "*
 {
-  if (REG_P (operands[0]))
+  if (REG_P(operands[0]))
     return AS2 (test%W0,%0,%0);
 
   operands[1] = const0_rtx;
   return AS2 (cmp%W0,%1,%0);
 }"
-  [(set_attr "type" "test")])
+  [(set_attr "pipes" "uv")
+   (set_attr "prefix" "true")
+   (set_attr "type" "test")])
 
 (define_expand "tsthi"
   [(set (cc0)
@@ -295,13 +396,14 @@
   ""
   "*
 {
-  if (REG_P (operands[0]))
+  if (REG_P(operands[0]))
     return AS2 (test%B0,%0,%0);
 
   operands[1] = const0_rtx;
   return AS2 (cmp%B0,%1,%0);
 }"
-  [(set_attr "type" "test")])
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "test")])
 
 (define_expand "tstqi"
   [(set (cc0)
@@ -429,7 +531,8 @@
 		 (match_operand:SI 1 "general_operand" "ri,mr")))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
   "* return AS2 (cmp%L0,%1,%0);"
-  [(set_attr "type" "compare")])
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "compare")])
 
 (define_expand "cmpsi"
   [(set (cc0)
@@ -453,7 +556,9 @@
 		 (match_operand:HI 1 "general_operand" "ri,mr")))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
   "* return AS2 (cmp%W0,%1,%0);"
-  [(set_attr "type" "compare")])
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "uv")
+   (set_attr "type" "compare")])
 
 (define_expand "cmphi"
   [(set (cc0)
@@ -477,7 +582,8 @@
 		 (match_operand:QI 1 "general_operand" "qm,nq")))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
   "* return AS2 (cmp%B0,%1,%0);"
-  [(set_attr "type" "compare")])
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "compare")])
 
 (define_expand "cmpqi"
   [(set (cc0)
@@ -825,8 +931,8 @@
 
 (define_insn ""
   [(set (cc0)
-	(and:SI (match_operand:SI 0 "general_operand" "%ro")
-		(match_operand:SI 1 "nonmemory_operand" "ri")))]
+	(and:SI (match_operand:SI 0 "general_operand" "%ro,a,ro")
+		(match_operand:SI 1 "nonmemory_operand" "r,i,i")))]
   ""
   "*
 {
@@ -880,12 +986,13 @@
 
   return AS2 (test%L1,%0,%1);
 }"
-  [(set_attr "type" "compare")])
+  [(set_attr "pipes" "uv,uv,none")
+   (set_attr "type" "compare")])
 
 (define_insn ""
   [(set (cc0)
-	(and:HI (match_operand:HI 0 "general_operand" "%ro")
-		(match_operand:HI 1 "nonmemory_operand" "ri")))]
+	(and:HI (match_operand:HI 0 "general_operand" "%ro,a,ro")
+		(match_operand:HI 1 "nonmemory_operand" "r,i,i")))]
   ""
   "*
 {
@@ -929,12 +1036,15 @@
 
   return AS2 (test%W1,%0,%1);
 }"
-  [(set_attr "type" "compare")])
+  [
+   (set_attr "pipes" "uv,uv,none")
+   (set_attr "prefix" "true,false,false") ; FIXME - bit too optimistic
+   (set_attr "type" "compare")])
 
 (define_insn ""
   [(set (cc0)
-	(and:QI (match_operand:QI 0 "nonimmediate_operand" "%qm")
-		(match_operand:QI 1 "nonmemory_operand" "qi")))]
+	(and:QI (match_operand:QI 0 "nonimmediate_operand" "%qm,a,qm")
+		(match_operand:QI 1 "nonmemory_operand" "q,i,i")))]
   ""
   "*
 {
@@ -943,7 +1053,8 @@
 
   return AS2 (test%B1,%0,%1);
 }"
-  [(set_attr "type" "compare")])
+  [(set_attr "pipes" "uv,uv,none")
+   (set_attr "type" "compare")])
 
 ;; move instructions.
 ;; There is one for each machine mode,
@@ -955,14 +1066,16 @@
 	(match_operand:SI 1 "nonmemory_operand" "rn"))]
   "flag_pic"
   "* return AS1 (push%L0,%1);"
-  [(set_attr "memory" "store")])
+  [(set_attr "pipes" "uv")
+   (set_attr "memory" "store")])
 
 (define_insn ""
   [(set (match_operand:SI 0 "push_operand" "=<")
 	(match_operand:SI 1 "nonmemory_operand" "ri"))]
   "!flag_pic"
   "* return AS1 (push%L0,%1);"
-  [(set_attr "memory" "store")])
+  [(set_attr "pipes" "uv")
+   (set_attr "memory" "store")])
 
 ;; On a 386, it is faster to push MEM directly.
 
@@ -1037,7 +1150,8 @@
 
   return AS2 (mov%L0,%1,%0);
 }"
-  [(set_attr "type" "integer,integer,memory")
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "integer,integer,memory")
    (set_attr "memory" "*,*,load")])
 
 (define_insn ""
@@ -1069,7 +1183,8 @@
 
   return AS2 (mov%L0,%1,%0);
 }"
-  [(set_attr "type" "integer,memory")
+  [(set_attr "pipes" "uv")
+   (set_attr "type" "integer,memory")
    (set_attr "memory" "*,load")])
 
 (define_insn ""
@@ -1077,7 +1192,9 @@
 	(match_operand:HI 1 "nonmemory_operand" "ri"))]
   ""
   "* return AS1 (push%W0,%1);"
-  [(set_attr "type" "memory")
+  [(set_attr "pipes" "uv")
+   (set_attr "prefix" "true")
+   (set_attr "type" "memory")
    (set_attr "memory" "store")])
 
 (define_insn ""
@@ -1151,7 +1268,9 @@
 
   return AS2 (mov%W0,%1,%0);
 }"
-  [(set_attr "type" "integer,memory")
+  [(set_attr "prefix" "false,true")
+   (set_attr "pipes" "uv")
+   (set_attr "type" "integer,memory")
    (set_attr "memory" "*,load")])
 
 (define_expand "movstricthi"
@@ -1197,7 +1316,9 @@
 
   return AS2 (mov%W0,%1,%0);
 }"
-  [(set_attr "type" "integer,memory")])
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "uv")
+   (set_attr "type" "integer,memory")])
 
 ;; emit_push_insn when it calls move_by_pieces
 ;; requires an insn to "push a byte".
@@ -1207,7 +1328,9 @@
   [(set (match_operand:QI 0 "push_operand" "=<")
 	(match_operand:QI 1 "const_int_operand" "n"))]
   ""
-  "* return AS1(push%W0,%1);")
+  "* return AS1(push%W0,%1);"
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 (define_insn ""
   [(set (match_operand:QI 0 "push_operand" "=<")
@@ -1217,7 +1340,9 @@
 {
   operands[1] = gen_rtx_REG (HImode, REGNO (operands[1]));
   return AS1 (push%W0,%1);
-}")
+}"
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 ;; On i486, incb reg is faster than movb $1,reg.
 
@@ -1275,7 +1400,8 @@
     return (AS2 (mov%L0,%k1,%k0));
 
   return (AS2 (mov%B0,%1,%0));
-}")
+}"
+  [(set_attr "pipes" "uv")])
 
 ;; If it becomes necessary to support movstrictqi into %esi or %edi,
 ;; use the insn sequence:
@@ -1334,7 +1460,8 @@
     }
 
   return AS2 (mov%B0,%1,%0);
-}")
+}"
+  [(set_attr "pipes" "uv")])
 
 (define_insn "movsf_push"
   [(set (match_operand:SF 0 "push_operand" "=<,<")
@@ -1454,7 +1581,8 @@
 
   return singlemove_string (operands);
 }"
-  [(set_attr "type" "fld")])
+  [(set_attr "type" "fld")
+   (set_attr "pipes" "none,fx,fx,none")])
 
 
 (define_insn "swapsf"
@@ -1469,7 +1597,7 @@
     return AS1 (fxch,%1);
   else
     return AS1 (fxch,%0);
-}")
+}" [(set_attr "pipes" "v")])
 
 
 (define_insn "movdf_push"
@@ -1592,7 +1720,8 @@
 
   return output_move_double (operands);
 }"
-  [(set_attr "type" "fld")])
+  [(set_attr "type" "fld")
+   (set_attr "pipes" "none,fx,fx,none")])
 
 
 
@@ -1608,7 +1737,7 @@
     return AS1 (fxch,%1);
   else
     return AS1 (fxch,%0);
-}")
+}" [(set_attr "pipes" "v")])
 
 (define_insn "movxf_push"
   [(set (match_operand:XF 0 "push_operand" "=<,<")
@@ -1743,7 +1872,7 @@
     return AS1 (fxch,%1);
   else
     return AS1 (fxch,%0);
-}")
+}" [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(set (match_operand:DI 0 "push_operand" "=<")
@@ -1782,6 +1911,17 @@
   [(set_attr "type" "integer,memory")
    (set_attr "memory" "*,load")])
 
+;; Split the trivial case of movsi
+(define_split 
+  [(set (match_operand:DI 0 "general_operand" "or")
+	(match_operand:DI 1 "general_operand" "or"))]
+  "!reg_overlap_mentioned_p (operands[0], operands[1]) &&
+   (reload_completed | reload_in_progress)"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 6))]
+  "split_di (&operands[0], 1, &operands[3], &operands[4]);
+   split_di (&operands[1], 1, &operands[5], &operands[6]);")
+
 
 ;;- conversion instructions
 ;;- NONE
@@ -1829,7 +1969,9 @@
 #else
   return AS2 (movz%W0%L0,%1,%0);
 #endif
-}")
+}"
+   [(set_attr "pipes" "uv")
+    (set_attr "prefix" "false,true,true")])
 
 (define_split
   [(set (match_operand:SI 0 "register_operand" "")
@@ -1892,7 +2034,8 @@
 #else
   return AS2 (movz%B0%W0,%1,%0);
 #endif
-}")
+}"
+   [(set_attr "pipes" "uv")])
 
 (define_split
   [(set (match_operand:HI 0 "register_operand" "")
@@ -1983,7 +2126,8 @@
 #else
   return AS2 (movz%B0%L0,%1,%0);
 #endif
-}")
+}"
+   [(set_attr "pipes" "uv")])
 
 (define_split
   [(set (match_operand:SI 0 "register_operand" "")
@@ -2021,6 +2165,11 @@
 	       (const_int 255)))]
  "operands[2] = gen_rtx_REG (SImode, true_regnum (operands[1]));")
 
+
+;; this insn is not generated when optimizing, since it is handled by next split.
+;; once global.c will be changed to handle NO_CONFLICT correctly, this insn
+;; should be removed complettely, since standard gcc version is as good as
+;; this
 (define_insn "zero_extendsidi2"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,?r,?m")
 	(zero_extend:DI (match_operand:SI 1 "register_operand" "0,rm,r")))]
@@ -2049,17 +2198,38 @@
     output_asm_insn (AS2 (xor%L2,%2,%2), xops);
 
   RET;
-}")
+  }
+ "
+ [(set_attr "pipes" "none")])
+
+(define_split 
+  [(set (match_operand:DI 0 "general_operand" "")
+	(zero_extend:DI (match_operand:SI 1 "register_operand" "")))]
+  "reload_completed | reload_in_progress"
+  [(set (match_dup 3) (match_dup 1))
+   (set (match_dup 4) (const_int 0))]
+  "split_di (&operands[0], 1, &operands[3], &operands[4]);")
 
-;;- sign extension instructions
+;; - sign extension instructions
+
+;; only cltd case is generated when optimization enabled, since other cases
+;; are handled by next split
 
+;; it is better to move SI value, then let GCC generate move of DI value
+;; Also it is better to zero extend to memory than let GCC zero extend in
+;; registers and move them.
 (define_insn "extendsidi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(sign_extend:DI (match_operand:SI 1 "register_operand" "0")))]
+  [(set (match_operand:DI 0 "register_operand" "=Ar,?Ar,?o")
+	(sign_extend:DI (match_operand:SI 1 "register_operand" "0,mr,r")))]
   ""
   "*
 {
-  if (REGNO (operands[0]) == 0)
+  if (!REG_P (operands[1]) || !REG_P (operands[0]) || 
+      (REGNO (operands[0]) != REGNO (operands[1])))
+  {
+     output_asm_insn (AS2 (mov%L0,%1,%0), operands);
+  }
+  if (REG_P (operands[0]) && REGNO (operands[0]) == 0)
     {
       /* This used to be cwtl, but that extends HI to SI somehow.  */
 #ifdef INTEL_SYNTAX
@@ -2069,12 +2239,43 @@
 #endif
     }
 
-  operands[1] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1);
-  output_asm_insn (AS2 (mov%L0,%0,%1), operands);
+  split_di (&operands[0], 1, &operands[3], &operands[4]);
+  if (GET_CODE (operands[0]) == MEM)
+  output_asm_insn (AS2 (mov%L0,%1,%4), operands); else
+  output_asm_insn (AS2 (mov%L0,%0,%4), operands);
 
   operands[0] = GEN_INT (31);
-  return AS2 (sar%L1,%0,%1);
-}")
+  return AS2 (sar%L1,%0,%4);
+}"
+ [(set_attr "pipes" "none")])
+
+(define_split 
+  [(set (match_operand:DI 0 "general_operand" "")
+	(sign_extend:DI (match_operand:SI 1 "register_operand" "")))]
+  "(reload_completed | reload_in_progress) && i386_aligned_p (operands[0]) &&
+   i386_aligned_p (operands[1])"
+  [(set (match_dup 4) (match_dup 1))
+   (set (match_dup 3) (match_dup 1))
+   (set (match_dup 4)
+        (ashiftrt:SI (match_dup 4) (const_int 31)))
+  ]
+  "if (REG_P (operands[0]) && REGNO (operands[0]) == 0 && 
+      (!optimize_size || ix86_cpu!=PROCESSOR_PENTIUM)) FAIL;
+   split_di (&operands[0], 1, &operands[3], &operands[4]);")
+
+(define_split 
+  [(set (match_operand:DI 0 "register_operand" "")
+	(sign_extend:DI (match_operand:SI 1 "general_operand" "")))]
+  "(reload_completed | reload_in_progress) && i386_aligned_p (operands[0]) &&
+   i386_aligned_p (operands[1])"
+  [(set (match_dup 3) (match_dup 1))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 4)
+        (ashiftrt:SI (match_dup 4) (const_int 31)))
+  ]
+  "if (REG_P (operands[0]) && REGNO (operands[0]) == 0 && 
+      (!optimize_size || ix86_cpu!=PROCESSOR_PENTIUM)) FAIL;
+   split_di (&operands[0], 1, &operands[3], &operands[4]);")
 
 ;; Note that the i386 programmers' manual says that the opcodes
 ;; are named movsx..., but the assembler on Unix does not accept that.
@@ -2168,7 +2369,8 @@
     output_asm_insn (AS2 (mov%L0,%1,%0), xops);
 
   RET;
-}")
+}"
+   [(set_attr "pipes" "uv")])
 
 (define_insn ""
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,m")
@@ -2186,7 +2388,8 @@
     output_asm_insn (AS2 (mov%L0,%1,%0), xops);
 
   RET;
-}")
+}"
+   [(set_attr "pipes" "uv")])
 
 
 
@@ -3054,7 +3257,8 @@
 
   return AS2 (add%L0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary,binary,lea")
+   (set_attr "pipes" "uv")])
 
 ;; addsi3 is faster, so put this after.
 
@@ -3084,7 +3288,8 @@
   CC_STATUS_INIT;
   return AS2 (lea%L0,%a1,%0);
 }"
-  [(set_attr "type" "lea")])
+  [(set_attr "type" "lea")
+   (set_attr "pipes" "uv")])
 
 ;; ??? `lea' here, for three operand add?  If leaw is used, only %bx,
 ;; %si and %di can appear in SET_SRC, and output_asm_insn might not be
@@ -3155,7 +3360,9 @@
 
   return AS2 (add%W0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 (define_expand "addqi3"
   [(set (match_operand:QI 0 "general_operand" "")
@@ -3181,7 +3388,8 @@
 
   return AS2 (add%B0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 ;Lennart Augustsson <augustss@cs.chalmers.se>
 ;says this pattern just makes slower code:
@@ -3370,7 +3578,8 @@
 		  (match_operand:SI 2 "general_operand" "ri,rm")))]
   "ix86_binary_operator_ok (MINUS, SImode, operands)"
   "* return AS2 (sub%L0,%2,%0);"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 (define_expand "subhi3"
   [(set (match_operand:HI 0 "general_operand" "")
@@ -3411,7 +3620,8 @@
 		  (match_operand:QI 2 "general_operand" "qn,qmn")))]
   "ix86_binary_operator_ok (MINUS, QImode, operands)"
   "* return AS2 (sub%B0,%2,%0);"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 ;; The patterns that match these are at the end of this file.
 
@@ -3459,7 +3669,8 @@
     return AS2 (imul%W0,%2,%0);
   return AS3 (imul%W0,%2,%1,%0);
 }"
-  [(set_attr "type" "imul")])
+  [(set_attr "type" "imul")
+   (set_attr "prefix" "true")])
 
 (define_insn "mulsi3"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
@@ -3507,7 +3718,8 @@
 		 (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" "rm"))))]
   "TARGET_WIDE_MULTIPLY"
   "imul%L0 %2"
-  [(set_attr "type" "imul")])
+  [(set_attr "type" "imul")
+   (set_attr "prefix" "true")])
 
 (define_insn "umulsi3_highpart"
   [(set (match_operand:SI 0 "register_operand" "=d")
@@ -3556,15 +3768,15 @@
 
 (define_insn "divqi3"
   [(set (match_operand:QI 0 "register_operand" "=a")
-	(div:QI (match_operand:HI 1 "register_operand" "0")
-		(match_operand:QI 2 "nonimmediate_operand" "qm")))]
+	(truncate:QI (div:HI (match_operand:HI 1 "register_operand" "0")
+		             (sign_extend:HI (match_operand:QI 2 "nonimmediate_operand" "qm")))))]
   ""
   "idiv%B0 %2")
 
 (define_insn "udivqi3"
   [(set (match_operand:QI 0 "register_operand" "=a")
-	(udiv:QI (match_operand:HI 1 "register_operand" "0")
-		 (match_operand:QI 2 "nonimmediate_operand" "qm")))]
+	(truncate:QI (udiv:HI (match_operand:HI 1 "register_operand" "0")
+		              (zero_extend:QI (match_operand:QI 2 "nonimmediate_operand" "qm")))))]
   ""
   "div%B0 %2"
   [(set_attr "type" "idiv")])
@@ -3592,6 +3804,8 @@
   "TARGET_80387"
   "")
 
+
+
 ;; Remainder instructions.
 
 (define_insn "divmodsi4"
@@ -3838,7 +4052,8 @@
 
   return AS2 (and%L0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 (define_insn "andhi3"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r")
@@ -3917,7 +4132,9 @@
 
   return AS2 (and%W0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 (define_insn "andqi3"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q")
@@ -3925,7 +4142,8 @@
 		(match_operand:QI 2 "general_operand" "qn,qmn")))]
   ""
   "* return AS2 (and%B0,%2,%0);"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 /* I am nervous about these two.. add them later..
 ;I presume this means that we have something in say op0= eax which is small
@@ -4042,7 +4260,8 @@
 
   return AS2 (or%L0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 (define_insn "iorhi3"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r")
@@ -4127,7 +4346,9 @@
 
   return AS2 (or%W0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 (define_insn "iorqi3"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q")
@@ -4135,7 +4356,8 @@
 		(match_operand:QI 2 "general_operand" "qn,qmn")))]
   ""
   "* return AS2 (or%B0,%2,%0);"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 ;;- xor instructions
 
@@ -4171,7 +4393,8 @@
 byte_xor_operation:
 	    CC_STATUS_INIT;
 	      
-	    if (intval == 0xff)
+	    if (intval == 0xff && (optimize_size ||
+		    ix86_cpu!=PROCESSOR_PENTIUM))
 	      return AS1 (not%B0,%b0);
 
 	    if (intval != INTVAL (operands[2]))
@@ -4187,7 +4410,8 @@
 	  if (REG_P (operands[0]))
 	    {
 	      CC_STATUS_INIT;
-	      if (intval == 0xff)
+	      if (intval == 0xff && (optimize_size ||
+		      ix86_cpu!=PROCESSOR_PENTIUM))
 		return AS1 (not%B0,%h0);
 
 	      operands[2] = GEN_INT (intval);
@@ -4224,7 +4448,8 @@
 
   return AS2 (xor%L0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 (define_insn "xorhi3"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r")
@@ -4244,7 +4469,8 @@
 	  if (INTVAL (operands[2]) & 0xffff0000)
 	    operands[2] = GEN_INT (INTVAL (operands[2]) & 0xffff);
 
-	  if (INTVAL (operands[2]) == 0xff)
+	  if (INTVAL (operands[2]) == 0xff && (optimize_size ||
+		  ix86_cpu!=PROCESSOR_PENTIUM))
 	    return AS1 (not%B0,%b0);
 
 	  return AS2 (xor%B0,%2,%b0);
@@ -4258,9 +4484,9 @@
 	  CC_STATUS_INIT;
 	  operands[2] = GEN_INT ((INTVAL (operands[2]) >> 8) & 0xff);
 
-	  if (INTVAL (operands[2]) == 0xff)
+	  if (INTVAL (operands[2]) == 0xff && (optimize_size ||
+	         ix86_cpu!=PROCESSOR_PENTIUM))
 	    return AS1 (not%B0,%h0);
-
 	  return AS2 (xor%B0,%2,%h0);
 	}
     }
@@ -4286,7 +4512,9 @@
 
   return AS2 (xor%W0,%2,%0);
 }"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "prefix" "true")
+   (set_attr "pipes" "uv")])
 
 (define_insn "xorqi3"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q")
@@ -4294,7 +4522,8 @@
 		(match_operand:QI 2 "general_operand" "qn,qm")))]
   ""
   "* return AS2 (xor%B0,%2,%0);"
-  [(set_attr "type" "binary")])
+  [(set_attr "type" "binary")
+   (set_attr "pipes" "uv")])
 
 ;; logical operations for DImode
 
@@ -4357,6 +4586,7 @@
   RET;
 }")
 
+
 (define_insn "negsi2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
 	(neg:SI (match_operand:SI 1 "nonimmediate_operand" "0")))]
@@ -4367,7 +4597,9 @@
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
 	(neg:HI (match_operand:HI 1 "nonimmediate_operand" "0")))]
   ""
-  "neg%W0 %0")
+  "neg%W0 %0"
+  [(set_attr "prefix" "true")])
+
 
 (define_insn "negqi2"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
@@ -4379,31 +4611,36 @@
   [(set (match_operand:SF 0 "register_operand" "=f")
 	(neg:SF (match_operand:SF 1 "register_operand" "0")))]
   "TARGET_80387"
-  "fchs")
+  "fchs"
+  [(set_attr "pipes" "fx")])
 
 (define_insn "negdf2"
   [(set (match_operand:DF 0 "register_operand" "=f")
 	(neg:DF (match_operand:DF 1 "register_operand" "0")))]
   "TARGET_80387"
-  "fchs")
+  "fchs"
+  [(set_attr "pipes" "fx")])
 
 (define_insn ""
   [(set (match_operand:DF 0 "register_operand" "=f")
 	(neg:DF (float_extend:DF (match_operand:SF 1 "register_operand" "0"))))]
   "TARGET_80387"
-  "fchs")
+  "fchs"
+  [(set_attr "pipes" "fx")])
 
 (define_insn "negxf2"
   [(set (match_operand:XF 0 "register_operand" "=f")
 	(neg:XF (match_operand:XF 1 "register_operand" "0")))]
   "TARGET_80387"
-  "fchs")
+  "fchs"
+  [(set_attr "pipes" "fx")])
 
 (define_insn ""
   [(set (match_operand:XF 0 "register_operand" "=f")
 	(neg:XF (float_extend:XF (match_operand:DF 1 "register_operand" "0"))))]
   "TARGET_80387"
-  "fchs")
+  "fchs"
+  [(set_attr "pipes" "fx")])
 
 ;; Absolute value instructions
 
@@ -4412,35 +4649,40 @@
 	(abs:SF (match_operand:SF 1 "register_operand" "0")))]
   "TARGET_80387"
   "fabs"
-  [(set_attr "type" "fpop")])
+  [(set_attr "type" "fpop")
+   (set_attr "pipes" "fx")])
 
 (define_insn "absdf2"
   [(set (match_operand:DF 0 "register_operand" "=f")
 	(abs:DF (match_operand:DF 1 "register_operand" "0")))]
   "TARGET_80387"
   "fabs"
-  [(set_attr "type" "fpop")])
+  [(set_attr "type" "fpop")
+   (set_attr "pipes" "fx")])
 
 (define_insn ""
   [(set (match_operand:DF 0 "register_operand" "=f")
 	(abs:DF (float_extend:DF (match_operand:SF 1 "register_operand" "0"))))]
   "TARGET_80387"
   "fabs"
-  [(set_attr "type" "fpop")])
+  [(set_attr "type" "fpop")
+   (set_attr "pipes" "fx")])
 
 (define_insn "absxf2"
   [(set (match_operand:XF 0 "register_operand" "=f")
 	(abs:XF (match_operand:XF 1 "register_operand" "0")))]
   "TARGET_80387"
   "fabs"
-  [(set_attr "type" "fpop")])
+  [(set_attr "type" "fpop")
+   (set_attr "pipes" "fx")])
 
 (define_insn ""
   [(set (match_operand:XF 0 "register_operand" "=f")
 	(abs:XF (float_extend:XF (match_operand:DF 1 "register_operand" "0"))))]
   "TARGET_80387"
   "fabs"
-  [(set_attr "type" "fpop")])
+  [(set_attr "type" "fpop")
+   (set_attr "pipes" "fx")])
 
 (define_insn "sqrtsf2"
   [(set (match_operand:SF 0 "register_operand" "=f")
@@ -4536,22 +4778,56 @@
 ;;- one complement instructions
 
 (define_insn "one_cmplsi2"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
-	(not:SI (match_operand:SI 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,m")
+	(not:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")))]
   ""
-  "not%L0 %0")
+  "*
+  rtx xops[2];
+     if (ix86_cpu == PROCESSOR_PENTIUM && !optimize_size &&
+	     GET_CODE(operands[0])!=MEM)
+     {
+       xops[0] = operands[0];
+       xops[1] = GEN_INT (0xffffffff);
+       output_asm_insn(AS2 (xor%L0,%1,%0),xops);
+       RET;
+     }
+    return AS1 (not%L0,%0);"
+  [(set_attr "pipes" "uv,none")])
 
 (define_insn "one_cmplhi2"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
-	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,m")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")))]
   ""
-  "not%W0 %0")
+  "*
+  rtx xops[2];
+     if (ix86_cpu == PROCESSOR_PENTIUM && !optimize_size &&
+	     GET_CODE(operands[0])!=MEM)
+     {
+       xops[0] = operands[0];
+       xops[1] = GEN_INT (0xffff);
+       output_asm_insn(AS2 (xor%W0,%1,%0),xops);
+       RET;
+     }
+    return AS1 (not%W0,%0);"
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "uv,none")])
 
 (define_insn "one_cmplqi2"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
   ""
-  "not%B0 %0")
+  "*
+  rtx xops[2];
+     if (ix86_cpu == PROCESSOR_PENTIUM && !optimize_size &&
+	     GET_CODE(operands[0])!=MEM)
+     {
+       xops[0] = operands[0];
+       xops[1] = GEN_INT (0xff);
+       output_asm_insn(AS2 (xor%B0,%1,%0),xops);
+       RET;
+     }
+    return AS1 (not%B0,%0);"
+  [(set_attr "pipes" "uv,none")])
 
 ;;- arithmetic shift instructions
 
@@ -4631,7 +4907,8 @@
       output_asm_insn (AS2 (sal%L2,%0,%2), xops);
     }
   RET;
-}")
+}"
+  [(set_attr "pipes" "u")])
 
 (define_insn "ashldi3_non_const_int"
   [(set (match_operand:DI 0 "register_operand" "=&r")
@@ -4667,9 +4944,9 @@
 ;; is smaller - use leal for now unless the shift count is 1.
 
 (define_insn "ashlsi3"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,rm")
-	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "r,0")
-		   (match_operand:SI 2 "nonmemory_operand" "M,cI")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,rm,rm")
+	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "r,0,0")
+		   (match_operand:SI 2 "nonmemory_operand" "M,I,c")))]
   ""
   "*
 {
@@ -4702,12 +4979,13 @@
     return AS2 (add%L0,%0,%0);
 
   return AS2 (sal%L0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,u,none")])
 
 (define_insn "ashlhi3"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
-	(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0")
-		   (match_operand:HI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,rm")
+	(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")
+		   (match_operand:HI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -4718,12 +4996,14 @@
     return AS2 (add%W0,%0,%0);
 
   return AS2 (sal%W0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")
+   (set_attr "prefix" "true")])
 
 (define_insn "ashlqi3"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
-	(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0")
-		   (match_operand:QI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,qm")
+	(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")
+		   (match_operand:QI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -4734,7 +5014,9 @@
     return AS2 (add%B0,%0,%0);
 
   return AS2 (sal%B0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")])
+
 
 ;; See comment above `ashldi3' about how this works.
 
@@ -4781,7 +5063,8 @@
     output_asm_insn (AS2 (xor%L2,%2,%2), xops);
 
   RET;
-}")
+}"
+  [(set_attr "pipes" "uv")])
 
 (define_insn "ashrdi3_const_int"
   [(set (match_operand:DI 0 "register_operand" "=&r")
@@ -4852,9 +5135,9 @@
 }")
 
 (define_insn "ashrsi3"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
-	(ashiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0")
-		     (match_operand:SI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm")
+	(ashiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:SI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -4862,12 +5145,13 @@
     return AS2 (sar%L0,%b2,%0);
   else
     return AS2 (sar%L0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")])
 
 (define_insn "ashrhi3"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
-	(ashiftrt:HI (match_operand:HI 1 "nonimmediate_operand" "0")
-		     (match_operand:HI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,rm")
+	(ashiftrt:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:SI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -4875,12 +5159,14 @@
     return AS2 (sar%W0,%b2,%0);
   else
     return AS2 (sar%W0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")
+   (set_attr "prefix" "true")])
 
 (define_insn "ashrqi3"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
-	(ashiftrt:QI (match_operand:QI 1 "nonimmediate_operand" "0")
-		     (match_operand:QI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,qm")
+	(ashiftrt:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:QI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -4888,7 +5174,8 @@
     return AS2 (sar%B0,%b2,%0);
   else
     return AS2 (sar%B0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")])
 
 ;;- logical shift instructions
 
@@ -5006,9 +5293,9 @@
 }")
 
 (define_insn "lshrsi3"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
-	(lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0")
-		     (match_operand:SI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm")
+	(lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:SI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -5016,12 +5303,13 @@
     return AS2 (shr%L0,%b2,%0);
   else
     return AS2 (shr%L0,%2,%1);
-}")
+}"
+  [(set_attr "pipes" "u,none")])
 
 (define_insn "lshrhi3"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm")
-	(lshiftrt:HI (match_operand:HI 1 "nonimmediate_operand" "0")
-		     (match_operand:HI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,rm")
+	(lshiftrt:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:HI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -5029,12 +5317,14 @@
     return AS2 (shr%W0,%b2,%0);
   else
     return AS2 (shr%W0,%2,%0);
-}")
+}"
+  [(set_attr "prefix" "true")
+   (set_attr "pipes" "u,none")])
 
 (define_insn "lshrqi3"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
-	(lshiftrt:QI (match_operand:QI 1 "nonimmediate_operand" "0")
-		     (match_operand:QI 2 "nonmemory_operand" "cI")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,qm")
+	(lshiftrt:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")
+		     (match_operand:QI 2 "nonmemory_operand" "I,c")))]
   ""
   "*
 {
@@ -5042,7 +5332,8 @@
     return AS2 (shr%B0,%b2,%0);
   else
     return AS2 (shr%B0,%2,%0);
-}")
+}"
+  [(set_attr "pipes" "u,none")])
 
 ;;- rotate instructions
 
@@ -5070,7 +5361,8 @@
     return AS2 (rol%W0,%b2,%0);
   else
     return AS2 (rol%W0,%2,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 (define_insn "rotlqi3"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
@@ -5109,7 +5401,8 @@
     return AS2 (ror%W0,%b2,%0);
   else
     return AS2 (ror%W0,%2,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 (define_insn "rotrqi3"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
@@ -5210,7 +5503,8 @@
     return AS2 (bts%L0,%2,%0);
   else
     return AS2 (btr%L0,%2,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 ;; Bit complement.  See comments on previous pattern.
 ;; ??? Is this really worthwhile?
@@ -5225,7 +5519,8 @@
   CC_STATUS_INIT;
 
   return AS2 (btc%L0,%1,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 (define_insn ""
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
@@ -5238,7 +5533,8 @@
   CC_STATUS_INIT;
 
   return AS2 (btc%L0,%2,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 ;; Recognizers for bit-test instructions.
 
@@ -5259,12 +5555,13 @@
 {
   cc_status.flags |= CC_Z_IN_NOT_C;
   return AS2 (bt%L0,%1,%0);
-}")
+}"
+ [(set_attr "prefix" "true")])
 
 (define_insn ""
-  [(set (cc0) (zero_extract (match_operand:SI 0 "register_operand" "r")
-			    (match_operand:SI 1 "const_int_operand" "n")
-			    (match_operand:SI 2 "const_int_operand" "n")))]
+  [(set (cc0) (zero_extract (match_operand:SI 0 "register_operand" "a,r")
+			    (match_operand:SI 1 "const_int_operand" "n,n")
+			    (match_operand:SI 2 "const_int_operand" "n,n")))]
   ""
   "*
 {
@@ -5290,7 +5587,9 @@
     }
 
   return AS2 (test%L0,%1,%0);
-}")
+}"
+ [(set_attr "pipes" "uv,none")])
+
 
 ;; ??? All bets are off if operand 0 is a volatile MEM reference.
 ;; The CPU may access unspecified bytes around the actual target byte.
@@ -5350,7 +5649,8 @@
     return AS2 (test%L0,%1,%0);
 
   return AS2 (test%L1,%0,%1);
-}")
+}"
+ [(set_attr "pipes" "uv")])
 
 ;; Store-flag instructions.
 
@@ -5671,7 +5971,8 @@
     return (char *)0;
 
   return AS1(j%D0,%l1);
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(set (pc)
@@ -5725,7 +6026,8 @@
     return (char *)0;
 
   return AS1(j%d0,%l1);
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 ;; Unconditional and other jump instructions
 
@@ -5733,7 +6035,8 @@
   [(set (pc)
 	(label_ref (match_operand 0 "" "")))]
   ""
-  "jmp %l0")
+  "jmp %l0"
+ [(set_attr "pipes" "v")])
 
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:SI 0 "nonimmediate_operand" "rm"))]
@@ -5743,7 +6046,8 @@
   CC_STATUS_INIT;
 
   return AS1 (jmp,%*%0);
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 ;; ??? could transform while(--i > 0) S; to if (--i > 0) do S; while(--i);
 ;;     if S does not change i
@@ -6075,7 +6379,8 @@
     }
   else
     return AS1 (call,%P0);
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(call (mem:QI (match_operand:SI 0 "symbolic_operand" ""))
@@ -6083,7 +6388,8 @@
    (set (reg:SI 7) (plus:SI (reg:SI 7)
 			    (match_operand:SI 3 "immediate_operand" "i")))]
   "!HALF_PIC_P ()"
-  "call %P0")
+  "call %P0"
+ [(set_attr "pipes" "v")])
 
 (define_expand "call"
   [(call (match_operand:QI 0 "indirect_operand" "")
@@ -6123,14 +6429,16 @@
     }
   else
     return AS1 (call,%P0);
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(call (mem:QI (match_operand:SI 0 "symbolic_operand" ""))
 	 (match_operand:SI 1 "general_operand" "g"))]
   ;; Operand 1 not used on the i386.
   "!HALF_PIC_P ()"
-  "call %P0")
+  "call %P0"
+ [(set_attr "pipes" "v")])
 
 ;; Call subroutine, returning value in operand 0
 ;; (which must be a hard register).
@@ -6180,7 +6488,8 @@
     output_asm_insn (AS1 (call,%P1), operands);
 
   RET;
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(set (match_operand 0 "" "=rf")
@@ -6189,7 +6498,8 @@
    (set (reg:SI 7) (plus:SI (reg:SI 7)
 			    (match_operand:SI 4 "immediate_operand" "i")))]
   "!HALF_PIC_P ()"
-  "call %P1")
+  "call %P1"
+ [(set_attr "pipes" "v")])
 
 (define_expand "call_value"
   [(set (match_operand 0 "" "")
@@ -6233,7 +6543,8 @@
     output_asm_insn (AS1 (call,%P1), operands);
 
   RET;
-}")
+}"
+ [(set_attr "pipes" "v")])
 
 (define_insn ""
   [(set (match_operand 0 "" "=rf")
@@ -6241,7 +6552,8 @@
 	      (match_operand:SI 2 "general_operand" "g")))]
   ;; Operand 2 not used on the i386.
   "!HALF_PIC_P ()"
-  "call %P1")
+  "call %P1"
+ [(set_attr "pipes" "v")])
 
 ;; Call subroutine returning any type.
 
@@ -6338,7 +6650,8 @@
   xops[1] = stack_pointer_rtx;
   output_asm_insn (AS2 (sub%L1,%0,%1), xops);
   RET;
-}")
+}"
+ [(set_attr "pipes" "uv")])
 
 (define_insn "prologue_set_got"
   [(set (match_operand:SI 0 "" "")
@@ -6362,7 +6675,8 @@
       output_asm_insn (buffer, operands);
     }    
   RET;
-}")
+}"
+ [(set_attr "pipes" "uv")])
 
 (define_insn "prologue_get_pc"
   [(set (match_operand:SI 0 "" "")
@@ -6378,7 +6692,8 @@
       ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, \"L\", CODE_LABEL_NUMBER (operands[1]));
     }    
   RET;
-}")
+}"
+ [(set_attr "pipes" "uv")])
 
 (define_insn "prologue_get_pc_and_set_got"
   [(unspec_volatile [(match_operand:SI 0 "" "")] 3)]
@@ -6756,7 +7071,8 @@
 			 (match_operand:DF 2 "nonimmediate_operand" "fm,0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6773,7 +7089,8 @@
 	    (match_operand:DF 2 "register_operand" "0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6790,7 +7107,8 @@
 			 (match_operand:XF 2 "register_operand" "f,0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6807,7 +7125,8 @@
 	    (match_operand:XF 2 "register_operand" "0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6824,7 +7143,8 @@
 	    (match_operand:XF 2 "register_operand" "0,f")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6841,7 +7161,8 @@
 	   (float:XF (match_operand:SI 2 "nonimmediate_operand" "rm"))]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6859,7 +7180,8 @@
 	    (match_operand:SF 2 "nonimmediate_operand" "fm,0"))]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6876,7 +7198,8 @@
 	    (match_operand:DF 2 "register_operand" "0,f")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6893,7 +7216,8 @@
 	   (float:DF (match_operand:SI 2 "nonimmediate_operand" "rm"))]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6911,7 +7235,8 @@
 	    (match_operand:SF 2 "nonimmediate_operand" "fm,0"))]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6928,7 +7253,8 @@
 			 (match_operand:SF 2 "nonimmediate_operand" "fm,0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6945,7 +7271,8 @@
 	   (match_operand:SF 2 "register_operand" "0")]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
@@ -6962,7 +7289,8 @@
 	   (float:SF (match_operand:SI 2 "nonimmediate_operand" "rm"))]))]
   "TARGET_80387"
   "* return output_387_binary_op (insn, operands);"
-  [(set (attr "type") 
+  [(set_attr "pipes" "fx")
+   (set (attr "type") 
         (cond [(match_operand:DF 3 "is_mul" "") 
                  (const_string "fpmul")
                (match_operand:DF 3 "is_div" "") 
-- 
                       OK. Lets make a signature file.
+-------------------------------------------------------------------------+
|        Jan Hubicka (Jan Hubi\v{c}ka in TeX) hubicka@freesoft.cz         |
|         Czech free software foundation: http://www.freesoft.cz          |
|AA project - the new way for computer graphics - http://www.ta.jcu.cz/aa |
|  homepage: http://www.paru.cas.cz/~hubicka/, games koules, Xonix, fast  |
|  fractal zoomer XaoS, index of Czech GNU/Linux/UN*X documentation etc.  | 
+-------------------------------------------------------------------------+


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]