This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[3.5 patch] i386.md: Fix target/11877.


Hi,

Attached is a patch to fix optimization/11877.

This is a 3.5 material.  I am posting this so that the patch can be
referenced from the PR.

Consider

void
foo (long long *p)
{
  *p = 0;
}

Current gcc produces a 19-byte sequence:

foo:
	movl	4(%esp), %eax
	movl	$0, (%eax)
	movl	$0, 4(%eax)
	ret

With patch, the above code is reduced to 14 bytes:

foo:
	movl	4(%esp), %eax
	xorl	%edx, %edx
	movl	%edx, (%eax)
	movl	%edx, 4(%eax)
	ret

The patch does this transformation with peephole2.  There is a
splitter to split a move in DImode.  The patch delays the split if we
are storing 0 into memory.

This transformation happens 10 times or so in GCC.  In most of the
cases, "xorl %edx,%edx" is scheduled a lot earlier than the use of
%edx, so I don't think this has negative effect.  However, I saw two
object files grow by 10 bytes or so, which I have not analyzed.

The DImode mode splitter has a comment saying

  ;; %%% This multiword shite has got to go.

so it's quite possible that I am making i386.md dirtier.

Tested on i686-pc-linux.gnu.

Kazu Hirata

2004-01-04  Kazu Hirata  <kazu@cs.umass.edu>

	PR target/11877
	* config/i386/i386.md (multiword split): Delay the split if
	storing 0 into memory.
	(one peephole2): New.

Index: i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.499
diff -c -r1.499 i386.md
*** i386.md	3 Jan 2004 00:40:31 -0000	1.499
--- i386.md	3 Jan 2004 22:06:59 -0000
***************
*** 1935,1943 ****
          (match_operand:DI 1 "general_operand" ""))]
    "!TARGET_64BIT && reload_completed
     && (!MMX_REG_P (operands[0]) && !SSE_REG_P (operands[0]))
!    && (!MMX_REG_P (operands[1]) && !SSE_REG_P (operands[1]))"
    [(const_int 0)]
    "ix86_split_long_move (operands); DONE;")
  
  (define_insn "*movdi_1_rex64"
    [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,mr,!mr,!*y,!rm,!*y,!*Y,!rm,!*Y")
--- 1935,1965 ----
          (match_operand:DI 1 "general_operand" ""))]
    "!TARGET_64BIT && reload_completed
     && (!MMX_REG_P (operands[0]) && !SSE_REG_P (operands[0]))
!    && (!MMX_REG_P (operands[1]) && !SSE_REG_P (operands[1]))
!    && (!(GET_CODE (operands[0]) == MEM
! 	 && !CONSTANT_ADDRESS_P (XEXP (operands[0], 0))
! 	 && operands[1] == const0_rtx)
!        || flow2_completed
!        || !flag_peephole2)"
    [(const_int 0)]
    "ix86_split_long_move (operands); DONE;")
+ 
+ ;; Storing (const_int 0) into a (mem:DI) can be done efficiently by
+ ;; clearing a scratch reg:SI and copying it to two mem:SI locations.
+ 
+ (define_peephole2
+   [(match_scratch:SI 1 "r")
+    (set (match_operand:DI 0 "memory_operand" "")
+         (const_int 0))]
+   "peep2_regno_dead_p (0, FLAGS_REG)"
+   [(parallel [(set (match_dup 1)
+ 		   (const_int 0))
+ 	      (clobber (reg:CC 17))])
+    (set (match_dup 2)
+ 	(match_dup 1))
+    (set (match_dup 3)
+ 	(match_dup 1))]
+   "split_di (&operands[0], 1, &operands[2], &operands[3]);")
  
  (define_insn "*movdi_1_rex64"
    [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,mr,!mr,!*y,!rm,!*y,!*Y,!rm,!*Y")


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]