Bug 42321 - NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly
Summary: NEON/VFP registers from inline assembly clobber list are saved/restored incor...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.2
: P2 normal
Target Milestone: 4.5.0
Assignee: Richard Earnshaw
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2009-12-07 14:33 UTC by Siarhei Siamashka
Modified: 2010-04-12 11:42 UTC (History)
4 users (show)

See Also:
Host: armv4tl-softfloat-linux-gnueabi, x86_64-linux
Target: armv4tl-softfloat-linux-gnueabi, arm-eabi
Build: armv4tl-softfloat-linux-gnueabi, x86_64-linux
Known to work:
Known to fail: 4.4.2 4.5.0
Last reconfirmed: 2010-03-21 15:58:54


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Siarhei Siamashka 2009-12-07 14:33:57 UTC
Test program:
/************************/
void f()
{
    asm volatile("veor d8, d8, d8" : : :"d8","d9","d10","d11","d14","d15");
}
/************************/

$ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c
$ objdump -d test.o

00000000 <f>:
   0:   ed2d8b08        vpush   {d8-d11}
   4:   ed2deb04        vpush   {d14-d15}
   8:   f3088118        veor    d8, d8, d8
   c:   ecbd8b08        vpop    {d8-d11}
  10:   ecbdeb04        vpop    {d14-d15}
  14:   e12fff1e        bx      lr

The order of the last two vpop instructions is messed up.
Comment 1 Siarhei Siamashka 2009-12-07 14:42:13 UTC
Modifying the program to list q-registers in the clobber list provides even more interesting results:
/************************/
void f()
{
    asm volatile("veor d8, d8, d8" : : :"q4","q5","q7");
}
/************************/

$ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c
$ objdump -d test.o

00000000 <f>:
   0:   ed2d8b02        vpush   {d8}
   4:   ed2dab02        vpush   {d10}
   8:   ed2deb02        vpush   {d14}
   c:   f3088118        veor    d8, d8, d8
  10:   ecbd8b02        vpop    {d8}
  14:   ecbdab02        vpop    {d10}
  18:   ecbdeb02        vpop    {d14}
  1c:   e12fff1e        bx      lr

Now in addition to the mismatched save/restore order, only lower halves of q-registers get saved.
Comment 2 Ramana Radhakrishnan 2009-12-07 15:52:27 UTC
Also appears with trunk as of today.
Comment 3 Richard Earnshaw 2009-12-07 15:55:56 UTC
I can confirm both of these issues.

in asm statements GCC currently just treats 'q4' and 'd8' as aliases for s16 (which of course is just a 32-bit register); there's currently no way of expressing that a larger entitiy is clobbered.  Of course, you won't see that in the prologue/epilogue code because the whole D register is saved even if just part of it has been used, but it could cause data-flow related issues elsewhere.
in asm statements GCC currently just treats 'q4' and 'd8' as aliases for s16 (which of course is just a 32-bit register); there's currently no way of expressing that a larger entitiy is clobbered.  Of course, you won't see that in the prologue/epilogue code because the whole D register is saved even if just part of it has been used, but it could cause data-flow related issues elsewhere.
Comment 4 Ramana Radhakrishnan 2009-12-09 16:55:17 UTC
This occurs with arm-eabi cross as well.

Ramana
Comment 5 Ramana Radhakrishnan 2010-01-14 16:21:04 UTC
I took a cursory look at this case . This looks like a bug in the
backend specifically in arm_output_epilogue where the epilogue code
isn't designed to cope for restoring disjoint sets of registers from
the stack when the frame pointer is eliminated.

The epilogue code for restoring these registers would work fine in the
case where

 a. The registers to be restored are in a single sequence for load
multiple irrespective of whether the frame pointer is used or not.

 b. If the function ends up using a frame_pointer.

We need a separate case for when the frame pointer is not required and
one has more than one sequence to restore.
Comment 6 Richard Earnshaw 2010-03-21 15:58:54 UTC
testing fix
Comment 7 Richard Earnshaw 2010-03-21 20:27:14 UTC
Subject: Bug 42321

Author: rearnsha
Date: Sun Mar 21 20:27:00 2010
New Revision: 157609

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157609
Log:
	PR target/42321
	* arm.c (arm_output_epilogue): Correctly match VFP pop instructions
	with their corresponding prologue pushes.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c

Comment 8 Richard Earnshaw 2010-03-21 20:30:03 UTC
Fixed in trunk