This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
zero extensions for PPro
- To: medtekh at orc dot ru
- Subject: zero extensions for PPro
- From: Jeffrey A Law <law at hurl dot cygnus dot com>
- Date: Tue, 09 Mar 1999 01:52:50 -0700
- cc: "Martin v. Loewis" <martin at mira dot isdn dot cs dot tu-berlin dot de>, hjl at varesearch dot com, egcs-patches at egcs dot cygnus dot com
- Reply-To: law at cygnus dot com
As I mentioned a short while ago, I have some changes to improve how we
handle zero extensions for the x86 port.
First, when optimizing for size, "movz" wins consistently over masking with
an "and", regardless of the source operand.
PPro/PII (speed):
movz reg,reg --> 1 uop P01
movz mem,reg --> 1 uop P2
and imm,reg --> 1 uop P01
mov mem,reg;and imm,reg --> 2 uops P2, P01
So, movz always generates minimal uops and is smaller. So we always
want to use movz on PPro/PII.
For others processors I think our code generation was reasonable, but tended
to generate multiple instructions from within the extension patterns. Those
cases should be using a splitter.
We want to use a separate pattern for space/PPro opts so that we can have the
constraints precisely match what the pattern supports.
This patch deals with getting the size and PPro/PII optimization cases
correct. I'll send the splitter changes separately. Then we need to update
the andXX patterns.
My results with this patch are similar to HJ's. Compression gets a little
bit faster, decompression is quite a bit faster.
* i386.md (zero_extendhisi2): Split into an expander and anonymous
patter. Add new anonymous pattern for use when optimizing for
size or for the PPro.
(zero_extendqihi2, zero_extendqisi2): Likewise.
Index: i386.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/i386/i386.md,v
retrieving revision 1.51
diff -c -3 -p -r1.51 i386.md
*** i386.md 1999/03/08 23:31:28 1.51
--- i386.md 1999/03/09 08:44:50
***************
*** 1789,1798 ****
;;- zero extension instructions
;; See comments by `andsi' for when andl is faster than movzx.
! (define_insn "zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
! ""
"*
{
rtx xops[2];
--- 1789,1813 ----
;;- zero extension instructions
;; See comments by `andsi' for when andl is faster than movzx.
! (define_expand "zero_extendhisi2"
! [(set (match_operand:SI 0 "register_operand" "")
! (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "")))]
! ""
! "")
!
! ;; When optimizing for the PPro/PII or code size, always use movzwl.
! ;; We want to use a different pattern so we can use different constraints
! ;; than the generic pattern.
! (define_insn ""
! [(set (match_operand:SI 0 "register_operand" "=r")
! (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "rm")))]
! "(optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
! "* return AS2 (movz%W0%L0,%1,%0);")
!
! (define_insn ""
[(set (match_operand:SI 0 "register_operand" "=r,&r,?r")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,rm,rm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];
***************
*** 1852,1862 ****
(and:SI (match_dup 0)
(const_int 65535)))]
"operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
! (define_insn "zero_extendqihi2"
[(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! ""
"*
{
rtx xops[2];
--- 1867,1890 ----
(and:SI (match_dup 0)
(const_int 65535)))]
"operands[2] = gen_rtx_REG (HImode, true_regnum (operands[0]));")
+
+ (define_expand "zero_extendqihi2"
+ [(set (match_operand:HI 0 "register_operand" "")
+ (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "")))]
+ ""
+ "")
+
+ (define_insn ""
+ [(set (match_operand:HI 0 "register_operand" "=r")
+ (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
+ "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
+
+ "* return AS2 (movz%B0%W0,%1,%0);")
! (define_insn ""
[(set (match_operand:HI 0 "register_operand" "=q,&q,?r")
(zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];
***************
*** 1934,1943 ****
FAIL;
operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
! (define_insn "zero_extendqisi2"
[(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! ""
"*
{
rtx xops[2];
--- 1962,1983 ----
FAIL;
operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));")
! (define_expand "zero_extendqisi2"
! [(set (match_operand:SI 0 "register_operand" "")
! (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))]
! ""
! "")
!
! (define_insn ""
! [(set (match_operand:SI 0 "register_operand" "=r")
! (zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
! "optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO"
! "* return AS2 (movz%B0%L0,%1,%0);")
!
! (define_insn ""
[(set (match_operand:SI 0 "register_operand" "=q,&q,?r")
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "0,qm,qm")))]
! "! (optimize_size || (int)ix86_cpu == (int)PROCESSOR_PENTIUMPRO)"
"*
{
rtx xops[2];