This is the mail archive of the
mailing list for the GCC project.
k-byte memset/memcpy/strlen builtins
- From: Robin Dapp <rdapp at linux dot vnet dot ibm dot com>
- To: gcc at gcc dot gnu dot org
- Date: Wed, 11 Jan 2017 17:16:43 +0100
- Subject: k-byte memset/memcpy/strlen builtins
- Authentication-results: sourceware.org; auth=none
When examining the performance of some test cases on s390 I realized
that we could do better for constructs like 2-byte memcpys or
2-byte/4-byte memsets. Due to some s390-specific architectural
properties, we could be faster by e.g. avoiding excessive unrolling and
using dedicated memory instructions (or similar).
For 1-byte memset/memcpy the builtin functions provide a straightforward
way to achieve this. At first sight it seemed possible to extend
tree-loop-distribution.c to include the additional variants we need.
However, multibyte memsets/memcpys are not covered by the C standard and
I'm therefore unsure if such an approach is preferable or if there are
more idiomatic ways or places where to add the functionality.
The same question goes for 2-byte strlen. I didn't see a recognition
pattern for strlen (apart from optimizations due to known string length
in tree-ssa-strlen.c). Would it make sense to include strlen recognition
and subsequently handling for 2-byte strlen? The situation might of
course more complicated than memset because of encodings etc. My snippet
in question used a fixed-length encoding of 2 bytes, however.
Another simple idea to tackle this would be a peephole optimization but
I'm not sure if this is really feasible for something like memset.
Wouldn't the peephole have to be recursive then?