This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Inline functiion in GCC


On Friday 06 August 2004 11:26, Venkatachala Upadhya wrote:
> Hello GCC people,
>
> I am working on audio Codecs. I am using Linux 2.4.20 kernel on
> ARM926EJS processor. The tool chain used is GCC 3.3.1 and the binutils
> 2.14. The codec code is optimised for ARMEJS priocessor using ADS suite
> version 1.2. This optimised code is having C inline functions with
> inline assembly statements. I have given here one inline C function with
> inline assembly code for reference.
>
> __inline Word16 add(Word16, Word16);
>
>
> __inline Word16 add(Word16 x, Word16 y)
> {
>     Word32 xs, ys;
>     Word16 rs;
>     __asm{
>         mov     xs, x, lsl #16
>         mov     ys, y, lsl #16
>         qadd    xs, xs, ys
>         mov     rs, xs, asr #16
>     }
>     return (rs);
>
> }
>
>
> I have ported the same code to GNU tool chain. I have used the
> C-Language extention with Extended inline assembly feature, available in
> GNU tool chain. The GNU port code is also reproduced here for reference.
>
>
> __inline__ static int add(int, int) __attribute__ ((always_inline));
>
> __inline__ static int add(int x, int y)
> {
>     int xs, ys;
>     __asm__  __volatile__
>     (
>          "mov     %3, %0, lsl #16 \n;"
>          "mov     %4, %2, lsl #16 \n;"
>          "qadd    %3, %3, %4 \n;"
>          "mov     %0, %3, asr #16 \n;"
>
>          : "=r" (x)
>          : "0" (x), "r" (y), "r" (xs), "r" (ys)
>
>     ) ;
>     return (x);
> }

This is wrong. It should be:

         "mov     %1, %3, lsl #16 \n"
         "mov     %2, %4, lsl #16 \n"
         "qadd    %1, %1, %2 \n"
         "mov     %0, %1, asr #16 \n"
	 : "=r" (x), "=&r" (xs), "=r" (ys)
         : "0" (x), "r" (y)

You cannot write to input operands. Furthermore because xs and ys have 
undefined values the compiler may assume they are the same, and allocate them 
to the same register. Notice also the use of early-clobbers for values which 
are written to before all inputs are read.


Furthermore the "0" constraint on the input seems unnessessary. Changing it to 
"r" gives gcc more freedom in register allocation, and results in better 
code. a better sequence would be:

__inline__ static int add(int x, int y)
{
    int xs, ys;
    __asm__  __volatile__
    (
         "mov     %1, %3, lsl #16 \n"
         "mov     %2, %4, lsl #16 \n"
         "qadd    %1, %1, %2 \n"
         "mov     %0, %1, asr #16 \n"
	 : "=r" (x), "=&r" (xs), "=r" (ys)
         : "1" (x), "r" (y)
    ) ;
    return (x);

In fact we don't need the temporaries at all:

__inline__ static int add(int x, int y)
{
    __asm__  __volatile__
    (
         "mov     %0, %2, lsl #16 \n"
         "mov     %1, %3, lsl #16 \n"
         "qadd    %0, %0, %1 \n"
         "mov     %0, %0, asr #16 \n"
	 : "=&r" (x), "=r" (y)
         : "r" (x), "r" (y)
    ) ;
    return (x);

<snilp>

> Compilation option used is
>
> arm_v4t_le-gcc  -march=armv5te -msoft-float -finline-functions -Winline
> -I. inline_test.c  -o  inline_out

Try adding -O2. 
Like most compilers, gcc generates very poor code if you don't turn 
optimization on.

Your code generated bad code due to the incorrect assembly constraints.
With the modifications described above your program compiles to just just 22 
instructons, including the call to printf. Examining the assembly shows that 
gcc is generating an optimal sequence.

Paul


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]