[PATCH] builtin expansion of memcmp for powerpc

Segher Boessenkool segher@kernel.crashing.org
Thu Sep 22 22:24:00 GMT 2016


Hi Aaron,

On Thu, Sep 22, 2016 at 03:10:24PM -0500, Aaron Sawdey wrote:
> The powerpc target had a movmemsi pattern which supports memcpy() but
> did not have anything for memcmp(). This adds support for builtin
> expansion of memcmp() into inline code for modest constant lengths.
> Performance on power8 is in the range of 3-7x faster than calling
> memcmp() for lengths under 40 bytes.
> 
> Bootstrap on powerpc64le, regtest in progress, OK for trunk if no new
> regressions?

[ You are testing on BE and 32-bit as well, thanks. ]

Just a bit more formatting, and one nit:

> 2016-09-22  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
> 
	PR target/77685   <<<--- please add this here
> 	* config/rs6000/rs6000.md (cmpmemsi): New define_expand.
> 	* config/rs6000/rs6000.c (expand_block_compare): New function used by
> 	cmpmemsi pattern to do builtin expansion of memcmp().

Space before (, here and below.

> 	(compute_current_alignment): Add helper function for
> 	expand_block_compare used to compute alignment as the compare proceeds.
> 	(select_block_compare_mode): Used by expand_block_compare to select
> 	the mode used for reading the next chunk of bytes in the compare.
> 	(do_load_for_compare): Used by expand_block_compare to emit the load
> 	insns for the compare.
> 	(rs6000_emit_dot_insn): Moved this function to avoid a forward
> 	reference from expand_block_compare().
> 	* config/rs6000/rs6000-protos.h (expand_block_compare): Add a
> 	prototype for this function.
> 	* config/rs6000/rs6000.opt (mblock-compare-inline-limit): Add a new
> 	target option for controlling how much code inline expansion of
> 	memcmp() will be allowed to generate.

You can just say "New function." and the like for most things here, fwiw.

> +/* Figure out the correct instructions to generate to load data for
> +   block compare.  MODE is used for the read from memory, and
> +   data is zero extended if REG is wider than MODE.  If LE code
> +   is being generated, bswap loads are used.

"Emit a load for a block compare", maybe?  I.e. say it emits code, and
it does only one load.

> +  /* final fallback is do one byte */

Capital, dot space space.

> +  /* If this is not a fixed size compare, just call memcmp */

Dot space space.

> +  /* This must be a fixed size alignment */

Dot space space.

> +  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff */

Full sentence.

> +  /* Anything to move? */

Question mark space space.

> +  int offset = 0;
> +  machine_mode load_mode =
> +    select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
> +  int load_mode_size = GET_MODE_SIZE (load_mode);

HOST_WIDE_INT instead of the ints?  It isn't slower and I find it hard to
convince myself int always works.

> +	  int extra_bytes = load_mode_size - bytes;

Here too.

> +      int remain = bytes - cmp_bytes;

One more.

Very nice improvement, thank you!  This is okay for trunk.


Segher



More information about the Gcc-patches mailing list