This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 0/2] [ARM] PR61551 addressing mode costs


From: Charles Baylis <charles.baylis@linaro.org>

Hi Ramana,

This patch set continues previous work on fixing the cost calculations for MEMs
which use different addressing modes. It implements the approach we discussed
at Linaro Connect BKK16.

I have included some notes on the patch set as follows:


Background:

The motivating problem is that this function:
  char *f(char *p, int8x8x4_t v, int r) { vst4_s8(p, v); p+=32; return p; }
compiles to:
        mov     r3, r0
        adds    r0, r0, #32
        vst4.8  {d0-d3}, [r3]
        bx      lr
but we would like to get:
        vst4.8  {d0-d3}, [r0]!
        bx      lr

Although the ARM back end contains patterns for the write-back forms of these
instructions, they are not currently generated. The reason for this is that the
auto-inc-dec phase does not perform this optimisation because arm_rtx_costs
incorrectly calculates the cost of "vst4.8  {d0-d3}, [r0]!" as much higher than
"vst4.8  {d0-d3}, [r3]". For that reason, it considers the POST_INC form to be
worse than the initial sequence of vst4/add and does not perform the
transformation.

In fact, GCC6 has regressions compared to GCC5 in this area, and no longer
does post-indexed addressing for int64_t or 64 bit vector types.


Solution:

Change cost calculation for MEMs so that the cost of the memory access
is computed separately from the cost of the addressing mode. A new
table-driven mechanism is introduced for the costs of the addressing modes.

The first patch in the series implements the calculation of the cost of
the memory access.

The second patch adds the table-driven model of the extra cost of the
selected addressing mode. I don't have access to a lot of CPU pipeline
information, so most CPUs use the generic cost table, with the exception of
Cortex-A57.


Testing:

I did "make check" on arm-linux-gnueabihf with qemu. This patch fixes one test
failure in lp1243022.c.


Benchmarking:

On Cortex-A15, SPEC2006 and a popular suite of embedded benchmarks perform the
same as before this patch is applied.  This is expected, the expected gain is
in code quality for hand-written NEON intrinsics code.



Charles Baylis (2):
  [ARM] Refactor costs calculation for MEM.
  [ARM] Add table of costs for AAarch32 addressing modes.

 gcc/config/arm/aarch-common-protos.h |  16 +++++
 gcc/config/arm/aarch-cost-tables.h   |  54 ++++++++++++++--
 gcc/config/arm/arm.c                 | 120 ++++++++++++++++++++++++++---------
 3 files changed, 154 insertions(+), 36 deletions(-)

-- 
2.7.4


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]