zero extensions for PPro
Jeffrey A Law
law@hurl.cygnus.com
Tue Mar 9 00:55:00 GMT 1999
As I mentioned a short while ago, I have some changes to improve how we
handle zero extensions for the x86 port.
First, when optimizing for size, "movz" wins consistently over masking with
an "and", regardless of the source operand.
PPro/PII (speed):
movz reg,reg --> 1 uop P01
movz mem,reg --> 1 uop P2
and imm,reg --> 1 uop P01
mov mem,reg;and imm,reg --> 2 uops P2, P01
So, movz always generates minimal uops and is smaller. So we always
want to use movz on PPro/PII.
For others processors I think our code generation was reasonable, but tended
to generate multiple instructions from within the extension patterns. Those
cases should be using a splitter.
We want to use a separate pattern for space/PPro opts so that we can have the
constraints precisely match what the pattern supports.
This patch deals with getting the size and PPro/PII optimization cases
correct. I'll send the splitter changes separately. Then we need to update
the andXX patterns.
My results with this patch are similar to HJ's. Compression gets a little
bit faster, decompression is quite a bit faster.
* i386.md (zero_extendhisi2): Split into an expander and anonymous
patter. Add new anonymous pattern for use when optimizing for
size or for the PPro.
(zero_extendqihi2, zero_extendqisi2): Likewise.
Index: i386.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/i386/i386.md,v
retrieving revision 1.51
diff -c -3 -p -r1.51 i386.md
*** i386.md 1999/03/08 23:31:28 1.51
--- i386.md 1999/03/09 08:44:50
***************
*** 1789,1798 ****
;;- zero extension instructions
;; See comments by `andsi' for when andl is faster than movzx.
! (define_insn "zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
! ""
"*
{
rtx xops[2];
--- 1789,1813 ----
;;- zero extension instructions
;; See comments by `andsi' for when andl is faster than movzx.
! (define_expand "zero_extendhisi2"
! [(set (match_operand:SI 0 "register_operand" "")
! (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
! ""
! "")
!
! ;; When optimizing for the PPro/PII or code size, always use movzwl.
! ;; We want to use a different pattern so we can use different constraints
! ;; than the generic pattern.
! (define_insn ""
! [(set (match_operand:SI 0 "register_operand" "=r")
! (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "rm")))]
! "(optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
! "* return AS2 (movz%W0%L0,%1,%0);")
!
! (define_insn ""
[(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];
***************
*** 1852,1862 ****
(and:SI (match_dup 0)
(const_int 65535)))]
"operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
! (define_insn "zero_extendqihi2"
[(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! ""
"*
{
rtx xops[2];
--- 1867,1890 ----
(and:SI (match_dup 0)
(const_int 65535)))]
"operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
+
+ (define_expand "zero_extendqihi2"
+ [(set (match_operand:HI 0 "register_operand" "")
+ (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "")))]
+ ""
+ "")
+
+ (define_insn ""
+ [(set (match_operand:HI 0 "register_operand" "=r")
+ (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
+ "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
+
+ "* return AS2 (movz%B0%W0,%1,%0);")
! (define_insn ""
[(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];
***************
*** 1934,1943 ****
FAIL;
operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
! (define_insn "zero_extendqisi2"
[(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! ""
"*
{
rtx xops[2];
--- 1962,1983 ----
FAIL;
operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
! (define_expand "zero_extendqisi2"
! [(set (match_operand:SI 0 "register_operand" "")
! (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))]
! ""
! "")
!
! (define_insn ""
! [(set (match_operand:SI 0 "register_operand" "=r")
! (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
! "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
! "* return AS2 (movz%B0%L0,%1,%0);")
!
! (define_insn ""
[(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];
More information about the Gcc-patches
mailing list