[PATCH] builtin expansion of memcmp for powerpc
Segher Boessenkool
segher@kernel.crashing.org
Thu Sep 22 22:24:00 GMT 2016
Hi Aaron,
On Thu, Sep 22, 2016 at 03:10:24PM -0500, Aaron Sawdey wrote:
> The powerpc target had a movmemsi pattern which supports memcpy() but
> did not have anything for memcmp(). This adds support for builtin
> expansion of memcmp() into inline code for modest constant lengths.
> Performance on power8 is in the range of 3-7x faster than calling
> memcmp() for lengths under 40 bytes.
>
> Bootstrap on powerpc64le, regtest in progress, OK for trunk if no new
> regressions?
[ You are testing on BE and 32-bit as well, thanks. ]
Just a bit more formatting, and one nit:
> 2016-09-22  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
>
PR target/77685 <<<--- please add this here
> * config/rs6000/rs6000.md (cmpmemsi): New define_expand.
> * config/rs6000/rs6000.c (expand_block_compare): New function used by
> cmpmemsi pattern to do builtin expansion of memcmp().
Space before (, here and below.
> (compute_current_alignment): Add helper function for
> expand_block_compare used to compute alignment as the compare proceeds.
> (select_block_compare_mode): Used by expand_block_compare to select
> the mode used for reading the next chunk of bytes in the compare.
> (do_load_for_compare): Used by expand_block_compare to emit the load
> insns for the compare.
> (rs6000_emit_dot_insn): Moved this function to avoid a forward
> reference from expand_block_compare().
> * config/rs6000/rs6000-protos.h (expand_block_compare): Add a
> prototype for this function.
> * config/rs6000/rs6000.opt (mblock-compare-inline-limit): Add a new
> target option for controlling how much code inline expansion of
> memcmp() will be allowed to generate.
You can just say "New function." and the like for most things here, fwiw.
> +/* Figure out the correct instructions to generate to load data for
> + block compare. MODE is used for the read from memory, and
> + data is zero extended if REG is wider than MODE. If LE code
> + is being generated, bswap loads are used.
"Emit a load for a block compare", maybe? I.e. say it emits code, and
it does only one load.
> + /* final fallback is do one byte */
Capital, dot space space.
> + /* If this is not a fixed size compare, just call memcmp */
Dot space space.
> + /* This must be a fixed size alignment */
Dot space space.
> + /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff */
Full sentence.
> + /* Anything to move? */
Question mark space space.
> + int offset = 0;
> + machine_mode load_mode =
> + select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
> + int load_mode_size = GET_MODE_SIZE (load_mode);
HOST_WIDE_INT instead of the ints? It isn't slower and I find it hard to
convince myself int always works.
> + int extra_bytes = load_mode_size - bytes;
Here too.
> + int remain = bytes - cmp_bytes;
One more.
Very nice improvement, thank you! This is okay for trunk.
Segher
More information about the Gcc-patches
mailing list