This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[3.5 patch] i386.md: Fix target/11877.
- From: Kazu Hirata <kazu at cs dot umass dot edu>
- To: gcc-patches at gcc dot gnu dot org
- Date: Sun, 04 Jan 2004 01:55:12 -0500 (EST)
- Subject: [3.5 patch] i386.md: Fix target/11877.
Hi,
Attached is a patch to fix optimization/11877.
This is a 3.5 material. I am posting this so that the patch can be
referenced from the PR.
Consider
void
foo (long long *p)
{
*p = 0;
}
Current gcc produces a 19-byte sequence:
foo:
movl 4(%esp), %eax
movl $0, (%eax)
movl $0, 4(%eax)
ret
With patch, the above code is reduced to 14 bytes:
foo:
movl 4(%esp), %eax
xorl %edx, %edx
movl %edx, (%eax)
movl %edx, 4(%eax)
ret
The patch does this transformation with peephole2. There is a
splitter to split a move in DImode. The patch delays the split if we
are storing 0 into memory.
This transformation happens 10 times or so in GCC. In most of the
cases, "xorl %edx,%edx" is scheduled a lot earlier than the use of
%edx, so I don't think this has negative effect. However, I saw two
object files grow by 10 bytes or so, which I have not analyzed.
The DImode mode splitter has a comment saying
;; %%% This multiword shite has got to go.
so it's quite possible that I am making i386.md dirtier.
Tested on i686-pc-linux.gnu.
Kazu Hirata
2004-01-04 Kazu Hirata <kazu@cs.umass.edu>
PR target/11877
* config/i386/i386.md (multiword split): Delay the split if
storing 0 into memory.
(one peephole2): New.
Index: i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.499
diff -c -r1.499 i386.md
*** i386.md 3 Jan 2004 00:40:31 -0000 1.499
--- i386.md 3 Jan 2004 22:06:59 -0000
***************
*** 1935,1943 ****
(match_operand:DI 1 "general_operand" ""))]
"!TARGET_64BIT && reload_completed
&& (!MMX_REG_P (operands[0]) && !SSE_REG_P (operands[0]))
! && (!MMX_REG_P (operands[1]) && !SSE_REG_P (operands[1]))"
[(const_int 0)]
"ix86_split_long_move (operands); DONE;")
(define_insn "*movdi_1_rex64"
[(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,mr,!mr,!*y,!rm,!*y,!*Y,!rm,!*Y")
--- 1935,1965 ----
(match_operand:DI 1 "general_operand" ""))]
"!TARGET_64BIT && reload_completed
&& (!MMX_REG_P (operands[0]) && !SSE_REG_P (operands[0]))
! && (!MMX_REG_P (operands[1]) && !SSE_REG_P (operands[1]))
! && (!(GET_CODE (operands[0]) == MEM
! && !CONSTANT_ADDRESS_P (XEXP (operands[0], 0))
! && operands[1] == const0_rtx)
! || flow2_completed
! || !flag_peephole2)"
[(const_int 0)]
"ix86_split_long_move (operands); DONE;")
+
+ ;; Storing (const_int 0) into a (mem:DI) can be done efficiently by
+ ;; clearing a scratch reg:SI and copying it to two mem:SI locations.
+
+ (define_peephole2
+ [(match_scratch:SI 1 "r")
+ (set (match_operand:DI 0 "memory_operand" "")
+ (const_int 0))]
+ "peep2_regno_dead_p (0, FLAGS_REG)"
+ [(parallel [(set (match_dup 1)
+ (const_int 0))
+ (clobber (reg:CC 17))])
+ (set (match_dup 2)
+ (match_dup 1))
+ (set (match_dup 3)
+ (match_dup 1))]
+ "split_di (&operands[0], 1, &operands[2], &operands[3]);")
(define_insn "*movdi_1_rex64"
[(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,mr,!mr,!*y,!rm,!*y,!*Y,!rm,!*Y")