This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, AARCH64] improve long double 0.0 support
- From: Jim Wilson <jim dot wilson at linaro dot org>
- To: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 3 Jun 2015 17:35:35 -0700
- Subject: [PATCH, AARCH64] improve long double 0.0 support
- Authentication-results: sourceware.org; auth=none
I noticed that poor code is emitted for a long double 0.0. This testcase
long double sub (void) { return 0.0; }
void sub2 (long double *ld) { *ld = 0.0; }
currently generates
sub:
ldr q0, .LC0
ret
...
sub2:
ldr q0, .LC1
str q0, [x0]
ret
where LC0 and LC1 are 16-byte constant table long double zeros. With
the attached patch, I get
sub:
movi v0.2d, #0
ret
...
sub2:
stp xzr, xzr, [x0]
ret
The main problem is in aarch64_valid_floating_const, which rejects all
constants for TFmode. There is a comment that says we should handle
0, but not until after the movtf pattern is improved. This
improvement apparently happened two years ago with this patch
2013-05-09 Sofiane Naci <sofiane.naci@arm.com>
* config/aarch64/aarch64.md: New movtf split.
...
so this comment is no longer relevant, and we should handle 0 now.
The patch deletes the out of date comment and moves the 0 check before
the TFmode check so that TFmode 0 is accepted.
There are a few other changes needed to make this work well. The
movtf expander needs to avoid forcing 0 to a reg for a mem dest, just
like the movti pattern already does. The Ump/?rY alternative needs to
be split into two, as %H doesn't work for const_double 0, again this
is like the movti pattern. The condition needs to allow 0 values in
operand 1, as is done in the movti pattern.
I noticed another related problem while making this change. The
ldp/stp instructions in the movtf_aarch64 pattern have neon attribute
types. However, these are integer instructions with matching 'r'
constraints and hence should be using load2/store2 attribute types,
just like in the movti pattern.
This was tested with a default languages make bootstrap and make check
on an APM system.
There are some similar problems with the movsf and movdf patterns. I
plan to submit a patch to fix them after this one is accepted.
Jim
2015-06-03 Jim Wilson <jim.wilson@linaro.org>
* config/aarch64/aarch64.c (aarch64_valid_floating_const): Move
aarch64_float_const_zero_rtx_p check before TFmode check.
* config/aarch64/aarch64.md (movtf): Don't call force_reg if op1 is
an fp zero.
(movtf_aarch64): Separate ?rY alternative into two. Adjust assembly
code and attributes to match. Change condition from register_operand
to aarch64_reg_or_fp_zero for op1. Change type for ldp from
neon_load1_2reg to load2. Change type for stp from neon_store1_2reg
to store2.
Index: config/aarch64/aarch64.c
===================================================================
--- config/aarch64/aarch64.c (revision 224054)
+++ config/aarch64/aarch64.c (working copy)
@@ -7430,16 +7430,13 @@ aarch64_valid_floating_const (machine_mo
if (!CONST_DOUBLE_P (x))
return false;
- /* TODO: We could handle moving 0.0 to a TFmode register,
- but first we would like to refactor the movtf_aarch64
- to be more amicable to split moves properly and
- correctly gate on TARGET_SIMD. For now - reject all
- constants which are not to SFmode or DFmode registers. */
+ if (aarch64_float_const_zero_rtx_p (x))
+ return true;
+
+ /* We only handle moving 0.0 to a TFmode register. */
if (!(mode == SFmode || mode == DFmode))
return false;
- if (aarch64_float_const_zero_rtx_p (x))
- return true;
return aarch64_float_const_representable_p (x);
}
Index: config/aarch64/aarch64.md
===================================================================
--- config/aarch64/aarch64.md (revision 224053)
+++ config/aarch64/aarch64.md (working copy)
@@ -1040,18 +1040,20 @@ (define_expand "movtf"
FAIL;
}
- if (GET_CODE (operands[0]) == MEM)
+ if (GET_CODE (operands[0]) == MEM
+ && ! (GET_CODE (operands[1]) == CONST_DOUBLE
+ && aarch64_float_const_zero_rtx_p (operands[1])))
operands[1] = force_reg (TFmode, operands[1]);
"
)
(define_insn "*movtf_aarch64"
[(set (match_operand:TF 0
- "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump")
+ "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump")
(match_operand:TF 1
- "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,Ump,?rY"))]
+ "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,Ump,?r ,Y"))]
"TARGET_FLOAT && (register_operand (operands[0], TFmode)
- || register_operand (operands[1], TFmode))"
+ || aarch64_reg_or_fp_zero (operands[1], TFmode))"
"@
orr\\t%0.16b, %1.16b, %1.16b
#
@@ -1062,12 +1064,13 @@ (define_insn "*movtf_aarch64"
ldr\\t%q0, %1
str\\t%q1, %0
ldp\\t%0, %H0, %1
- stp\\t%1, %H1, %0"
+ stp\\t%1, %H1, %0
+ stp\\txzr, xzr, %0"
[(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\
- f_loadd,f_stored,neon_load1_2reg,neon_store1_2reg")
- (set_attr "length" "4,8,8,8,4,4,4,4,4,4")
- (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*")
- (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*")]
+ f_loadd,f_stored,load2,store2,store2")
+ (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
+ (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*,*")
+ (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
)
(define_split