[PATCH][AARCH64] Emulating aligned mask loads on AArch64

Fri Sep 18 10:41:00 GMT 2015

This patch uses max reductions to emulate aligned masked loads on AArch64.
It reduces the mask to a scalar that is nonzero if any mask element is true,
then uses that scalar to select between the real address and a scratchpad
address.

The idea is that if the vector load is aligned, it cannot cross a page
boundary and so cannot partially fault.  It is safe to load from the
address (and use only some of the result) if any mask element is true.

The patch provided a 15% speed improvement for simple microbenchmarks.

There were several spec2k6 benchmarks affected by patch: 400.perlbench,
403.gcc, 436.cactusADM, 454.calculix and 464.h264.  However, the changes
had no measureable effect on performance.

Regression-tested on x86_64-linux-gnu, aarch64-linux-gnu and 
arm-linux-gnueabi.

Thanks,
Pawel
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150918/70770e67/attachment.ksh>