Bug 43440 - Overwriting neon quad register does not clobber all included single registers
Overwriting neon quad register does not clobber all included single registers
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: target
4.5.0
: P3 normal
: 4.6.0
Assigned To: Not yet assigned to anyone
: wrong-code
: 43860 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-19 13:12 UTC by Kirill Batuzov
Modified: 2010-11-13 23:09 UTC (History)
6 users (show)

See Also:
Host:
Target: arm-unknown-linux-gnueabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-03-19 13:32:13


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kirill Batuzov 2010-03-19 13:12:29 UTC
Function:
float y;
float foo(float x)
{
    y = x * x + x;
    asm volatile ("vmov.i8 q0, #0" ::: "q0");
    asm volatile ("vmov.i8 q1, #0" ::: "q1");
    asm volatile ("vmov.i8 q2, #0" ::: "q2");
    asm volatile ("vmov.i8 q3, #0" ::: "q3");
    asm volatile ("vmov.i8 q4, #0" ::: "q4");
    asm volatile ("vmov.i8 q5, #0" ::: "q5");
    asm volatile ("vmov.i8 q6, #0" ::: "q6");
    asm volatile ("vmov.i8 q7, #0" ::: "q7");
    return y;
}
being compiled with "-O -Wall -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp" always returns 0.

The cause of incorrect behaviour is that in function foo y is allocated on s15 which is not considered clobbered by "asm volatile ("vmov.i8 q3, #0" ::: "q3");". Mentioning in clobber lists s1 s2 and s3 alongside q0, s5 s6 and s7 alongside q1 and so on solves the problem: clobbered register got spilled. Omitting any of them makes GCC allocate y on this register and do not spill it.

I also noticed that only d8, d10, d12 and d14 got saved at the beginning of the function though all d8-d15 should be saved. This was mentioned in bug #42321 comment #1 also.

I tried this example on 4.4.3 and today's trunk.
Comment 1 Ramana Radhakrishnan 2010-03-19 13:32:13 UTC
Yes, there is no easy way of describing the container relationship of the Neon registers to the compiler at the minute. Thus the only work around at the moment is to indicate that all the registers contained as part of this register name are clobbered in the explicit asm statement.


Comment 2 Ramana Radhakrishnan 2010-03-20 10:51:25 UTC
TARGET_MD_ASSEMBLE_CLOBBERS might just be the help we need on this one - I think I have a patch which generates the following code. Does that look any better to you ? 


foo:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        fmsr    s13, r0
        fmsr    s14, r0
        fmacs   s13, s14, s14
        fstmfdd sp!, {d8, d9, d10, d11, d12, d13, d14, d15}
        movw    r3, #:lower16:y
        sub     sp, sp, #8
        movt    r3, #:upper16:y
        fsts    s13, [sp, #4]
        fsts    s13, [r3, #0]
@ 5 "/tmp/fail.c" 1
        vmov.i8 q0, #0
@ 6 "/tmp/fail.c" 1
        vmov.i8 q1, #0
@ 7 "/tmp/fail.c" 1
        vmov.i8 q2, #0
@ 8 "/tmp/fail.c" 1
        vmov.i8 q3, #0
@ 9 "/tmp/fail.c" 1
        vmov.i8 q4, #0
@ 10 "/tmp/fail.c" 1
        vmov.i8 q5, #0
@ 11 "/tmp/fail.c" 1
        vmov.i8 q6, #0
@ 12 "/tmp/fail.c" 1
        vmov.i8 q7, #0
        ldr     r0, [sp, #4]    @ float
        add     sp, sp, #8
        fldmfdd sp!, {d8, d9, d10, d11, d12, d13, d14, d15}
        bx      lr
        .size   foo, .-foo
        .comm   y,4,4
        .ident  "GCC: (GNU) 4.5.0 20100319 (experimental)"


Comment 3 Kirill Batuzov 2010-03-20 12:35:17 UTC
(In reply to comment #2)
>  Does that look any better to you ? 
Yes, definitely better. Thank you for quick response.
Comment 4 Ramana Radhakrishnan 2010-03-20 12:42:23 UTC
Patch submitted here for comments.

http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00925.html

IMO the reasons as described in my email  is another motivation for Neon programmers to be using intrinsics rather than inline assembler and to improve in general Neon intrinsics. 

cheers
Ramana
Comment 5 Siarhei Siamashka 2010-03-21 03:33:26 UTC
I don't quite understand what's the problem: "This patch has the unhappy side effect of clobbering s0, s1 and s2 if s3 is used because that's the only way we can indicate that q0 is clobbered by the write to s0."

The proper solution seems to be extremely simple to me and it should do exactly the same what an application programmer would do to workaround the bug. Just when initially parsing clobber list do a simple text substitution "q0" -> "d0", "d1". Same for all the other "q" registers.
Comment 6 Siarhei Siamashka 2010-03-21 03:56:56 UTC
(In reply to comment #4)
> IMO the reasons as described in my email  is another motivation for Neon
> programmers to be using intrinsics rather than inline assembler and to improve
> in general Neon intrinsics.

The problem is that today neon intrinsics have a lot more issues in practice. The resulting code is way too slow to be usable, especially when gcc thinks that it is running out of registers and starts spilling variables to memory. Bug 43118 and bug 43364 are just some very basic examples of performance issues without looking any deeper. Not having many bugs in bugzilla for NEON intrinsics means that either they work good enough or nobody seriously uses them. At least for me it is the latter case.

Autovectorization is even worse than intrinsics.

Inline assembly has a few bugs, but they can be easily workarounded.

Sorry for this rant/offtopic. Just thought that you might be somewhat interested the opinion of someone from the other side of the fence :-)
Comment 7 Ramana Radhakrishnan 2010-03-21 09:23:26 UTC
(In reply to comment #5)

> The proper solution seems to be extremely simple to me and it should do exactly
> the same what an application programmer would do to workaround the bug. Just
> when initially parsing clobber list do a simple text substitution "q0" -> "d0",
> "d1". Same for all the other "q" registers.
> 


Maybe I wasn't clear enough. If you've written to s3 and used q0 some place in your program then you want to indicate that q0 has been clobbered. IIUC the way the Neon registers are represented in GCC is by having the same register "number" for s0, d0 and q0 - thus the clobber list generated for your inline asm statement at expand time is essentialy clobber ( reg:QI  68) or whatever be the register number for s0, d0 and q0. Thus at the time the register allocator comes along you've lost all information regarding the mode in which the register was used and it would think that just s0 was being clobbered. 

Thus the clean solution to this IMO is to have some way of representing a proper mode for each of the register numbers and then making the register allocator / other optimizers honour it. In theory it should work but in practice I'm not sure if that's the correct thing to do in stage4. 

This isn't a problem in standard C code because the RTL for accessing those variables would have the Mode in which the registers were being accessed and thus any overlapping can be detected. 

Hope that makes things clearer.
Ramana

Comment 8 Siarhei Siamashka 2010-03-21 10:05:13 UTC
What about just forbidding to use "q" registers in the inline assembly clobber list? Is it difficult to do?

As a nice bonus, the existing potentially unsafe inline assembly will fail to compile, will be spotted and will have to be fixed (forcing the application developer to manually convert clobber list to use "d" or "s" registers). It will also solve compatibility problems with the older versions of gcc which still have this bug and still might be in use for a very long time.
Comment 9 Ramana Radhakrishnan 2010-03-21 13:41:44 UTC
(In reply to comment #8)
> What about just forbidding to use "q" registers in the inline assembly clobber
> list? Is it difficult to do?
> 

It's not difficult to - 

The patch only penalises the cases where an s<reg> ends up clobbering another s<reg> in the same quad (or double for that matter) but the common case of the q<reg> or the d<reg> clobbering the s<reg> in the same quad is a necessity and correctness.  Thus if we wanted to allow q<reg> and d<reg> in the clobber list then we need to live with this restriction.

Ramana


Comment 10 Andrew Pinski 2010-04-22 23:21:04 UTC
*** Bug 43860 has been marked as a duplicate of this bug. ***
Comment 11 Richard Earnshaw 2010-11-13 23:08:31 UTC
Author: rearnsha
Date: Sat Nov 13 23:08:26 2010
New Revision: 166723

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166723
Log:
	PR target/43440
	* tm.texi.in (OVERLAPPING_REGISTER_NAMES): Document new macro.
	* tm.texi: Regenerated.
	* output.h (decode_reg_name_and_count): Declare.
	* varasm.c (decode_reg_name_and_count): New function.
	(decode_reg_name): Reimplement using decode_reg_name_and_count.
	* reginfo.c (fix_register): Use decode_reg_name_and_count and 
	iterate over all regs used.
	* stmt.c (expand_asm_operands): Likewise.
	* arm/aout.h (OVERLAPPING_REGISTER_NAMES): Define.
	(ADDITIONAL_REGISTER_NAMES): Remove aliases that overlap
	multiple machine registers.

Modified:
    trunk/gcc/ChangeLog
Comment 12 Richard Earnshaw 2010-11-13 23:09:47 UTC
Fixed on trunk.