This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][AARCH64] Emulating aligned mask loads on AArch64


This patch uses max reductions to emulate aligned masked loads on AArch64.
It reduces the mask to a scalar that is nonzero if any mask element is true,
then uses that scalar to select between the real address and a scratchpad
address.

The idea is that if the vector load is aligned, it cannot cross a page
boundary and so cannot partially fault.  It is safe to load from the
address (and use only some of the result) if any mask element is true.

The patch provided a 15% speed improvement for simple microbenchmarks.

There were several spec2k6 benchmarks affected by patch: 400.perlbench,
403.gcc, 436.cactusADM, 454.calculix and 464.h264.  However, the changes
had no measureable effect on performance.

Regression-tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.

Thanks,
Pawel

Attachment: patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]