Bug 37188 - Missing documentation about the use of ARM NEON quad registers in inline asm arguments
Missing documentation about the use of ARM NEON quad registers in inline asm ...
Status: NEW
Product: gcc
Classification: Unclassified
Component: inline-asm
unknown
: P3 normal
: ---
Assigned To: Not yet assigned to anyone
: documentation
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-21 13:29 UTC by Siarhei Siamashka
Modified: 2010-03-20 19:09 UTC (History)
1 user (show)

See Also:
Host: i486-linux-gnu
Target: arm-softfloat-linux-gnueabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-04-29 16:09:48


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Siarhei Siamashka 2008-08-21 13:29:32 UTC
Gcc manual, "5.38.4 Constraints for Particular Machines" section:

"ARM family—‘config/arm/arm.h’
          f        Floating-point register
          w        VFP floating-point register
          F        One of the floating-point constants 0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0 or
                   10.0
..."

Using "w" constraint allows to use single precision VFP floating point registers. But this does not work for double precision.
Comment 1 Siarhei Siamashka 2008-09-02 15:50:45 UTC
Well, looks like it is not a missing feature, but just incompleteness of documentation :)

It is possible to use double precision floating point registers and NEON 128-bit registers in the following way:

----------------------------------------------------------

#include <arm_neon.h>

int16x8_t test_neon(int16x8_t b, int16x8_t c)
{
    int16x8_t a;
    asm (
        "vadd.i32 %q0, %q1, %q2 \n\t"
        : "=w" (a)
        : "w" (b), "w" (c)
    );
    return a;
}

double test_double(double b, double c)
{
    double a;
    asm (
        "faddd %P0, %P1, %P2 \n\t"
        : "=w" (a)
        : "w" (b), "w" (c)
    );
    return a;
}

----------------------------------------------------------

Disassembly of section .text:

00000000 <test_quad>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0      ; 0x0
   8:   ec410b12        vmov    d2, r0, r1
   c:   ec432b13        vmov    d3, r2, r3
  10:   ed9b6b01        vldr    d6, [fp, #4]
  14:   ed9b7b03        vldr    d7, [fp, #12]
  18:   f2224846        vadd.i32        q2, q1, q3
  1c:   ec510b14        vmov    r0, r1, d4
  20:   ec532b15        vmov    r2, r3, d5
  24:   e28bd000        add     sp, fp, #0      ; 0x0
  28:   e8bd0800        pop     {fp}
  2c:   e12fff1e        bx      lr

00000030 <test_double>:
  30:   ec410b15        vmov    d5, r0, r1
  34:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
  38:   ec432b16        vmov    d6, r2, r3
  3c:   e28db000        add     fp, sp, #0      ; 0x0
  40:   ee357b06        faddd   d7, d5, d6
  44:   ec510b17        vmov    r0, r1, d7
  48:   e28bd000        add     sp, fp, #0      ; 0x0
  4c:   e8bd0800        pop     {fp}
  50:   e12fff1e        bx      lr
Comment 2 Ramana Radhakrishnan 2009-04-29 16:09:48 UTC
Needs a documentation tweak for all the extra bits in the inline assembler for printing operands.


Comment 3 Siarhei Siamashka 2010-03-14 12:44:26 UTC
As of today, gcc seems to be clever enough to deduct whether to use single precision or double precision VFP register when given "w" constraint (so P modifier is not strictly needed). This behavior seems to have been introduced in 4.3.2 gcc version.

However, trying to force double precision variables into specific VFP registers breaks it:

/************/
#include <stdio.h>
#include <stdint.h>

inline int32_t double_to_fixed_16_16(double dbl)
{
    int32_t fix;
    register double tmp asm ("d0") = dbl;
    asm volatile (
        "vcvt.s32.f64  %1, %1, #16\n"
        "vmov.f32      %0, %1[0]\n"
        : "=r" (fix), "+&w" (tmp)
    );
    return fix;
}

int main()
{
    int32_t i = double_to_fixed_16_16(1.5);
    printf("%08X\n", i);
}
/************/

/tmp/ccYfabov.s: Assembler messages:
/tmp/ccYfabov.s:24: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16'
/tmp/ccYfabov.s:25: Error: only D registers may be indexed -- `vmov.f32 r0,s0[0]'
/tmp/ccYfabov.s:45: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16'
/tmp/ccYfabov.s:46: Error: only D registers may be indexed -- `vmov.f32 r2,s0[0]'

Also NEON quad registers still need explicit 'q' modifier in inline assembly. Updating the issue summary because NEON quad registers are now more problematic than VFP doubles.


Thanks for your work on gcc. VFP/NEON support is slowly getting better over time.