Bug 63724 - [AArch64] Inefficient immediate expansion and hoisting.
Summary: [AArch64] Inefficient immediate expansion and hoisting.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: 5.0
Assignee: Ramana Radhakrishnan
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2014-11-03 15:40 UTC by Ramana Radhakrishnan
Modified: 2015-01-11 18:37 UTC (History)
0 users

See Also:
Host:
Target: aarch64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-11-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ramana Radhakrishnan 2014-11-03 15:40:23 UTC
For some cases like hmmer in SPEC2k6 we currently generate pretty rubbish code with AArch64. 

float
P7Viterbi(int **mmx, int L, int M, int **imx, int **dmx)
{
  int k;

  for (k = 0; k <= M; k++)
    mmx[0][k] = imx[0][k] = dmx[0][k] = -987654321;

}

This ends up generating pretty rubbish code at O2. 

tbnz	w2, #31, .L4
	ldr	x5, [x3]
	ldr	x4, [x4]
	ldr	x6, [x0]
	mov	x0, 0
.L3:
	mov	w1, 38735
	mov	w3, w1
	movk	w1, 0xc521, lsl 16
	str	w1, [x4, x0, lsl 2]
	movk	w3, 0xc521, lsl 16
	mov	w1, 38735
	str	w3, [x5, x0, lsl 2]
	movk	w1, 0xc521, lsl 16
	str	w1, [x6, x0, lsl 2]
	add	x0, x0, 1
	cmp	w2, w0
	bge	.L3
.L4:
	fmov	s0, wzr
	ret
	.size	P7Viterbi, .-P7Viterbi

and could well be 


P7Viterbi:
        tbnz    w2, #31, .L4
        ldr     x5, [x3]
        mov     w1, 38735
        ldr     x3, [x4]
        movk    w1, 0xc521, lsl 16
        ldr     x6, [x0]
        mov     x0, 0
.L3:
        str     w1, [x3, x0, lsl 2]
        str     w1, [x5, x0, lsl 2]
        str     w1, [x6, x0, lsl 2]
        add     x0, x0, 1
        cmp     w2, w0
        bge     .L3
.L4:
        fmov    s0, wzr
        ret
        .size   P7Viterbi, .-P7Viterbi

The hoisting is missed because we expand const_int's too early in the AArch64 backend. Given we don't have an "uncse" in the mid-end it's quite hard to recover when we've expanded to this form rather early in the compiler. The simple solution is just to move the logic out into a separate splitter function, additionally we should also investigate what happens if we start doing the same for our address computations, but that's the subject of a separate patch. 

Mine.
Comment 1 Ramana Radhakrishnan 2014-11-03 15:40:54 UTC
Mine.
Comment 2 Ramana Radhakrishnan 2014-11-14 09:59:00 UTC
Fixed by r217546
Comment 3 Ramana Radhakrishnan 2014-11-14 11:03:32 UTC
Author: ramana
Revision: 217546
Modified property: svn:log

Modified: svn:log at Fri Nov 14 11:03:00 2014
------------------------------------------------------------------------------
--- svn:log (original)
+++ svn:log Fri Nov 14 11:03:00 2014
@@ -1,1 +1,14 @@
-Fix typo in *<arith_shift_insn>_shiftsi
+Fix PR target/63724
+
+2014-11-14  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
+
+	PR target/63724
+        * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
+        numerical immediate handling to...
+        (aarch64_internal_mov_immediate): ...this. New.
+        (aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
+        (aarch64_mov_operand_p): Relax predicate.
+        * config/aarch64/aarch64.md (mov<mode>:GPI): Do not expand CONST_INTs.
+        (*movsi_aarch64): Turn into define_insn_and_split and new alternative
+        for 'n'.
+        (*movdi_aarch64): Likewise.
Comment 4 Yvan Roux 2015-01-11 18:37:14 UTC
Author: yroux
Date: Sun Jan 11 18:36:42 2015
New Revision: 219433

URL: https://gcc.gnu.org/viewcvs?rev=219433&root=gcc&view=rev
Log:
gcc/
2015-01-11  Yvan Roux  <yvan.roux@linaro.org>

	Backport from trunk r217362, r217546.
	2014-11-14  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	PR target/63724
        * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
        numerical immediate handling to...
        (aarch64_internal_mov_immediate): ...this. New.
        (aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
        (aarch64_mov_operand_p): Relax predicate.
        * config/aarch64/aarch64.md (mov<mode>:GPI): Do not expand CONST_INTs.
        (*movsi_aarch64): Turn into define_insn_and_split and new alternative
        for 'n'.
        (*movdi_aarch64): Likewise.

	2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md
	(aarch64_simd_bsl<mode>_internal): Remove float cases, canonicalize.
	(aarch64_simd_bsl<mode>): Add gen_lowpart expressions where we
	are punning between float vectors and integer vectors.

gcc/testsuite
2015-01-11  Yvan Roux  <yvan.roux@linaro.org>

	Backport from trunk r217362.
	2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/vbslq_f64_1.c: New.
	* gcc.target/aarch64/vbslq_f64_2.c: Likewise.
	* gcc.target/aarch64/vbslq_u64_1.c: Likewise.
	* gcc.target/aarch64/vbslq_u64_2.c: Likewise.


Added:
    branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c
    branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c
    branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c
    branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/aarch64/vbslq_u64_2.c
Modified:
    branches/linaro/gcc-4_9-branch/gcc/ChangeLog.linaro
    branches/linaro/gcc-4_9-branch/gcc/config/aarch64/aarch64-simd.md
    branches/linaro/gcc-4_9-branch/gcc/config/aarch64/aarch64.c
    branches/linaro/gcc-4_9-branch/gcc/config/aarch64/aarch64.md
    branches/linaro/gcc-4_9-branch/gcc/testsuite/ChangeLog.linaro