- GCC codegen in memcpy/memset is often quite suboptimal leading number of project to inline custom version of those especially on x86 and x86-64. This work involve rewrite of x86/x86_64 string routines expansion. New implementation allows tuning for specific subtarget via simple benchmark trick without guessing how to interpret code optimization guides. Machine independent changes involve adding profile feedback infrastructure for measuring expected size and alignment of the block and couple of minor builtin folding improvements.
- Jan Hubička
- Patch available now, just needs to be decomposed.
- Up to 200% speedups in microbenchmarks on different block sizes. Over 10% speedups for ASPI and Mesa with profile feedback on Opteron.
- Mostly independent steps:
- i386 expansion rewrite + utility
- profiling feedback support (there is problem with attaching the information to GIMPLE. At the moment memcpy/memset calls are replaced by new builtins memcpy_hints/memset_hints calls that do have the historgram as extra arguments. Can't think of better way until GIMPLE memory representation is reorganized.
- builtins improvements.