This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
- From: Evandro Menezes <e dot menezes at samsung dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Marcus Shawcroft <marcus dot shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>
- Date: Wed, 13 Jul 2016 15:10:31 -0500
- Subject: Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
- Authentication-results: sourceware.org; auth=none
- References: <578668FD.email@example.com>
On 07/13/16 11:14, Kyrill Tkachov wrote:
The most common way to load and store TImode value in aarch64 is to
perform an LDP/STP of two X-registers.
This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where it
validates the offset in the address used
to load a TImode value. It passes down TImode to the
aarch64_offset_7bit_signed_scaled_p check which rejects
offsets that are not a multiple of the mode size of TImode (16).
However, this is too conservative as X-reg LDP/STP
instructions accept immediate offsets that are a multiple of 8.
Also, considering that the definition of
return (offset >= -64 * GET_MODE_SIZE (mode)
&& offset < 64 * GET_MODE_SIZE (mode)
&& offset % GET_MODE_SIZE (mode) == 0);
I think the range check may even be wrong for TImode as this will
accept offsets in the range [-1024, 1024)
(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range
So since the check is for an X-reg LDP/STP address we should be
passing down DImode.
This patch does that and enables more aggressive generation of REG+IMM
addressing modes for 64-bit aligned
TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
add x1, x1, 8
add x0, x0, 8
ldp x2, x3, [x1]
stp x2, x3, [x0]
whereas with this patch we generate:
ldp x2, x3, [x1, 8]
stp x2, x3, [x0, 8]
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?