Gcc manual, "5.38.4 Constraints for Particular Machines" section: "ARM family—‘config/arm/arm.h’ f Floating-point register w VFP floating-point register F One of the floating-point constants 0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0 or 10.0 ..." Using "w" constraint allows to use single precision VFP floating point registers. But this does not work for double precision.
Well, looks like it is not a missing feature, but just incompleteness of documentation :) It is possible to use double precision floating point registers and NEON 128-bit registers in the following way: ---------------------------------------------------------- #include <arm_neon.h> int16x8_t test_neon(int16x8_t b, int16x8_t c) { int16x8_t a; asm ( "vadd.i32 %q0, %q1, %q2 \n\t" : "=w" (a) : "w" (b), "w" (c) ); return a; } double test_double(double b, double c) { double a; asm ( "faddd %P0, %P1, %P2 \n\t" : "=w" (a) : "w" (b), "w" (c) ); return a; } ---------------------------------------------------------- Disassembly of section .text: 00000000 <test_quad>: 0: e52db004 push {fp} ; (str fp, [sp, #-4]!) 4: e28db000 add fp, sp, #0 ; 0x0 8: ec410b12 vmov d2, r0, r1 c: ec432b13 vmov d3, r2, r3 10: ed9b6b01 vldr d6, [fp, #4] 14: ed9b7b03 vldr d7, [fp, #12] 18: f2224846 vadd.i32 q2, q1, q3 1c: ec510b14 vmov r0, r1, d4 20: ec532b15 vmov r2, r3, d5 24: e28bd000 add sp, fp, #0 ; 0x0 28: e8bd0800 pop {fp} 2c: e12fff1e bx lr 00000030 <test_double>: 30: ec410b15 vmov d5, r0, r1 34: e52db004 push {fp} ; (str fp, [sp, #-4]!) 38: ec432b16 vmov d6, r2, r3 3c: e28db000 add fp, sp, #0 ; 0x0 40: ee357b06 faddd d7, d5, d6 44: ec510b17 vmov r0, r1, d7 48: e28bd000 add sp, fp, #0 ; 0x0 4c: e8bd0800 pop {fp} 50: e12fff1e bx lr
Needs a documentation tweak for all the extra bits in the inline assembler for printing operands.
As of today, gcc seems to be clever enough to deduct whether to use single precision or double precision VFP register when given "w" constraint (so P modifier is not strictly needed). This behavior seems to have been introduced in 4.3.2 gcc version. However, trying to force double precision variables into specific VFP registers breaks it: /************/ #include <stdio.h> #include <stdint.h> inline int32_t double_to_fixed_16_16(double dbl) { int32_t fix; register double tmp asm ("d0") = dbl; asm volatile ( "vcvt.s32.f64 %1, %1, #16\n" "vmov.f32 %0, %1[0]\n" : "=r" (fix), "+&w" (tmp) ); return fix; } int main() { int32_t i = double_to_fixed_16_16(1.5); printf("%08X\n", i); } /************/ /tmp/ccYfabov.s: Assembler messages: /tmp/ccYfabov.s:24: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16' /tmp/ccYfabov.s:25: Error: only D registers may be indexed -- `vmov.f32 r0,s0[0]' /tmp/ccYfabov.s:45: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16' /tmp/ccYfabov.s:46: Error: only D registers may be indexed -- `vmov.f32 r2,s0[0]' Also NEON quad registers still need explicit 'q' modifier in inline assembly. Updating the issue summary because NEON quad registers are now more problematic than VFP doubles. Thanks for your work on gcc. VFP/NEON support is slowly getting better over time.
*** Bug 94054 has been marked as a duplicate of this bug. ***