Flag for handling inlining of strcmp/memcmp on i386

Martin Thuresson martint@google.com
Tue Sep 29 20:50:00 GMT 2009


Gcc currently inlines memcmp and strcmp to repz cmpsb during
optimization.  Since the library call has optimizations, such as
reading full, aligned words, it turns out that byte-by-byte
comparison is usually slower than calling the library functions.

The diagrams show performance numbers for the library
call and the inlined version. The numbers are from a
microbenchmark that compare buffers, (both equal and not equal
buffers), of various lengths.

http://www.ce.chalmers.se/~martin/foo/amd_opteron_call_repz.png
http://www.ce.chalmers.se/~martin/foo/intel_core_call_repz.png

The performance impact can be large for programs handling large
strings that are expected to be equal, though I did not see any
performance change on Spec2006 (less than 1% difference).

This patch introduces the flag -minline-compares that controls
the inlining.

Thanks,
Martin
-------------- next part --------------
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 152256)
+++ doc/invoke.texi	(working copy)
@@ -594,7 +594,8 @@ Objective-C and Objective-C++ Dialects}.
 -maes -mpclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
--minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
+-minline-compares -minline-stringops-dynamically @gol
+-mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
 -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
 -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol
@@ -11886,6 +11887,12 @@ aligned at least to 4 byte boundary.  Th
 size, but may improve performance of code that depends on fast memcpy, strlen
 and memset for short lengths.
 
+@item -minline-compares
+@opindex minline-compares
+This option enables GCC to inline calls to memcmp and strcmp.  The
+inlined version does a byte-by-byte comparion using a repeat string
+operation prefix.
+
 @item -minline-stringops-dynamically
 @opindex minline-stringops-dynamically
 For string operation of unknown size, inline runtime checks so for small
Index: testsuite/gcc.dg/20050503-1.c
===================================================================
--- testsuite/gcc.dg/20050503-1.c	(revision 152256)
+++ testsuite/gcc.dg/20050503-1.c	(working copy)
@@ -3,7 +3,8 @@
    expanders.  */
 /* { dg-do compile } */
 /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && { ilp32 && { ! nonpic } } } { "*" } { "" } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -minline-compares" { target { i?86-*-* || x86_64-*-* } } } */
+/* { dg-options "-O2" { target {! { i?86-*-* || x86_64-*-* } } } } */
 
 typedef __SIZE_TYPE__ size_t;
 
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 152256)
+++ config/i386/i386.md	(working copy)
@@ -20033,6 +20033,9 @@ (define_expand "cmpstrnsi"
 {
   rtx addr1, addr2, out, outlow, count, countreg, align;
 
+  if (!TARGET_INLINE_COMPARES)
+    FAIL;
+
   if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
     FAIL;
 
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 152256)
+++ config/i386/i386.opt	(working copy)
@@ -140,6 +140,10 @@ minline-all-stringops
 Target Report Mask(INLINE_ALL_STRINGOPS) Save
 Inline all known string operations
 
+minline-compares
+Target Report Mask(INLINE_COMPARES) Save
+Inline compare operations strcmp and memcmp
+
 minline-stringops-dynamically
 Target Report Mask(INLINE_STRINGOPS_DYNAMICALLY) Save
 Inline memset/memcpy string operations, but perform inline version only for small blocks
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 152256)
+++ config/i386/i386.c	(working copy)
@@ -2393,6 +2393,7 @@ ix86_target_string (int isa, int flags, 
     { "-mfp-ret-in-387",		MASK_FLOAT_RETURNS },
     { "-mieee-fp",			MASK_IEEE_FP },
     { "-minline-all-stringops",		MASK_INLINE_ALL_STRINGOPS },
+    { "-minline-compares",              MASK_INLINE_COMPARES },
     { "-minline-stringops-dynamically",	MASK_INLINE_STRINGOPS_DYNAMICALLY },
     { "-mms-bitfields",			MASK_MS_BITFIELD_LAYOUT },
     { "-mno-align-stringops",		MASK_NO_ALIGN_STRINGOPS },
@@ -3642,6 +3643,10 @@ ix86_valid_target_attribute_inner_p (tre
 		   OPT_minline_all_stringops,
 		   MASK_INLINE_ALL_STRINGOPS),
 
+    IX86_ATTR_YES ("inline-compares",
+		   OPT_minline_compares,
+		   MASK_INLINE_COMPARES),
+
     IX86_ATTR_YES ("inline-stringops-dynamically",
 		   OPT_minline_stringops_dynamically,
 		   MASK_INLINE_STRINGOPS_DYNAMICALLY),
-------------- next part --------------
2009-09-29  Martin Thuresson  <martint@google.com>

	* config/i386/i386.c (ix86_target_string)
	(ix86_valid_target_attribute_inner_p): Add minline-compares support.

	* config/i386/i386.md (cmpstrnsi): Update conditional.

	* config/i386/i386.opt (minline-compares): Add.

	* docs/invoke.texi (minline-compares): Document.

2009-09-29  Martin Thuresson  <martint@google.com>

	* gcc.dg/20050503-1.c: Adjust to use minline-compares.


More information about the Gcc-patches mailing list