Bug 46647 - Can't inline memset with -1
Summary: Can't inline memset with -1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-11-24 20:42 UTC by H.J. Lu
Modified: 2011-02-10 02:03 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A patch (728 bytes, patch)
2010-11-24 22:22 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2010-11-24 20:42:04 UTC
[hjl@gnu-16 gcc]$ cat /tmp/x.i 
char a[4];
int
func1 (void)
{
  __builtin_memset (a,-1,sizeof (a));
  return 0;
}

int
func2 (void)
{
  __builtin_memset (a,123,sizeof (a));
  return 0;
}
[hjl@gnu-16 gcc]$ ./xgcc -B./ -S /tmp/x.i -O2 -m32
[hjl@gnu-16 gcc]$ cat x.s
	.file	"x.i"
	.text
	.p2align 4,,15
	.globl	func1
	.type	func1, @function
func1:
.LFB0:
	.cfi_startproc
	subl	$28, %esp
	.cfi_def_cfa_offset 32
	movl	$4, 8(%esp)
	movl	$-1, 4(%esp)
	movl	$a, (%esp)
	call	memset
	xorl	%eax, %eax
	addl	$28, %esp
	.cfi_def_cfa_offset 4
	ret
	.cfi_endproc
.LFE0:
	.size	func1, .-func1
	.p2align 4,,15
	.globl	func2
	.type	func2, @function
func2:
.LFB1:
	.cfi_startproc
	movl	$2071690107, a
	xorl	%eax, %eax
	ret
	.cfi_endproc
.LFE1:
	.size	func2, .-func2
	.comm	a,4,1
	.ident	"GCC: (GNU) 4.6.0 20101124 (experimental) [trunk revision 167121]"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-16 gcc]$ 

Why can't we inline  __builtin_memset (a,123,sizeof (a))?
Comment 1 H.J. Lu 2010-11-24 20:47:25 UTC
We can inline 0x7fffffff and fail 0x8fffffff.
Comment 2 H.J. Lu 2010-11-24 20:57:03 UTC
It is target independent.
Comment 3 H.J. Lu 2010-11-24 21:08:10 UTC
fold_builtin_memset fails to handle 0x[8-f]fffffff.
Comment 4 Jakub Jelinek 2010-11-24 21:36:06 UTC
If the testcase starts with int a[1]; instead of char a[4];, then:
--- builtins.c.jj32010-11-19 20:56:54.000000000 +0100
+++ builtins.c2010-11-24 22:23:41.000000000 +0100
@@ -8345,7 +8345,7 @@ fold_builtin_memset (location_t loc, tre
   if (integer_zerop (len))
     return omit_one_operand_loc (loc, type, dest, c);
 
-  if (! host_integerp (c, 1) || TREE_SIDE_EFFECTS (dest))
+  if (TREE_CODE (c) != INTEGER_CST || TREE_SIDE_EFFECTS (dest))
     return NULL_TREE;
 
   var = dest;
@@ -8384,7 +8384,7 @@ fold_builtin_memset (location_t loc, tre
       if (CHAR_BIT != 8 || BITS_PER_UNIT != 8 || HOST_BITS_PER_WIDE_INT > 64)
         return NULL_TREE;
 
-      cval = tree_low_cst (c, 1);
+      cval = TREE_INT_CST_LOW (c);
       cval &= 0xff;
       cval |= cval << 8;
       cval |= cval << 16;

should fix this.  But with char a[4]; instead this isn't something that is optimized at the tree level at all, so similar change will be needed somewhere on the expander side.
Comment 5 H.J. Lu 2010-11-24 21:43:11 UTC
char a[4] isn't handled by fold_builtin_memset.
Comment 6 H.J. Lu 2010-11-24 21:52:46 UTC
The problem seems to be target_char_cast.
Comment 7 H.J. Lu 2010-11-24 22:00:25 UTC
This seems to work:

---
diff --git a/gcc/builtins.c b/gcc/builtins.c
index c9e8e68..840ec29 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -630,11 +630,10 @@ target_char_cast (tree cst, char *p)
 {
   unsigned HOST_WIDE_INT val, hostval;
 
-  if (!host_integerp (cst, 1)
-      || CHAR_TYPE_SIZE > HOST_BITS_PER_WIDE_INT)
+  if (CHAR_TYPE_SIZE > HOST_BITS_PER_WIDE_INT)
     return 1;
 
-  val = tree_low_cst (cst, 1);
+  val = TREE_INT_CST_LOW (cst);
   if (CHAR_TYPE_SIZE < HOST_BITS_PER_WIDE_INT)
     val &= (((unsigned HOST_WIDE_INT) 1) << CHAR_TYPE_SIZE) - 1;
----
Comment 8 Jakub Jelinek 2010-11-24 22:04:11 UTC
I'd prefer to replace the host_integerp call in that routine with
TREE_CODE (...) == INTEGER_CST check.  While currently callers check it as well, it isn't too expensive and makes it more robust.
Comment 9 H.J. Lu 2010-11-24 22:11:03 UTC
(In reply to comment #8)
> I'd prefer to replace the host_integerp call in that routine with
> TREE_CODE (...) == INTEGER_CST check.  While currently callers check it as
> well, it isn't too expensive and makes it more robust.

/* Cast a target constant CST to target CHAR and if that value fits into
   host char type, return zero and put that value into variable pointed to by
   P.  */

static int
target_char_cast (tree cst, char *p)

It is supposed to be called on integer constant. Adding a check
shouldn't be too bad:

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c9e8e68..a90bf2f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -630,11 +630,11 @@ target_char_cast (tree cst, char *p)
 {
   unsigned HOST_WIDE_INT val, hostval;
 
-  if (!host_integerp (cst, 1)
+  if (TREE_CODE (cst) != INTEGER_CST
       || CHAR_TYPE_SIZE > HOST_BITS_PER_WIDE_INT)
     return 1;
 
-  val = tree_low_cst (cst, 1);
+  val = TREE_INT_CST_LOW (cst);
   if (CHAR_TYPE_SIZE < HOST_BITS_PER_WIDE_INT)
     val &= (((unsigned HOST_WIDE_INT) 1) << CHAR_TYPE_SIZE) - 1;
Comment 10 H.J. Lu 2010-11-24 22:22:17 UTC
Created attachment 22519 [details]
A patch

I am testing it on Linux/x86-64.
Comment 11 hjl@gcc.gnu.org 2010-11-25 13:48:06 UTC
Author: hjl
Date: Thu Nov 25 13:47:42 2010
New Revision: 167146

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167146
Log:
Properly cast integer constant char. 

gcc/

2010-11-25  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/46647
	* builtins.c (target_char_cast): Check INTEGER_CST instead of
	host_integerp.  Replace tree_low_cst with TREE_INT_CST_LOW.

gcc/testsuite/

2010-11-25  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/46647
	* gcc.target/i386/pr46647.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr46647.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/testsuite/ChangeLog
Comment 12 Jakub Jelinek 2010-11-26 09:38:59 UTC
Author: jakub
Date: Fri Nov 26 09:38:54 2010
New Revision: 167170

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167170
Log:
	PR middle-end/46647
	* builtins.c (fold_builtin_memset): Check c is INTEGER_CST instead
	of host_integerp check.  Use TREE_INT_CST_LOW instead of tree_low_cst.

	* gcc.dg/pr46647.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr46647.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/testsuite/ChangeLog
Comment 13 H.J. Lu 2010-11-28 14:33:01 UTC
Fixed.