Bug 44364 - Wrong code with e500 double floating point
Wrong code with e500 double floating point
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: target
4.4.4
: P3 normal
: ---
Assigned To: Alan Modra
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-01 13:41 UTC by Sebastian Andrzej Siewior
Modified: 2013-02-07 01:44 UTC (History)
4 users (show)

See Also:
Host: powerpc-linux-gnuspe
Target: powerpc-linux-gnuspe
Build: powerpc-linux-gnuspe
Known to work:
Known to fail:
Last reconfirmed: 2010-06-07 04:41:52


Attachments
test case (6.12 KB, text/plain)
2010-06-01 13:42 UTC, Sebastian Andrzej Siewior
Details
objdump of the compiled testcase (27.47 KB, text/plain)
2010-06-01 13:42 UTC, Sebastian Andrzej Siewior
Details
Combined trivial testcase (For -O0) (276 bytes, text/plain)
2010-06-03 20:21 UTC, Kyle Moffett
Details
Multipart trivial testcase (For -O3) (196 bytes, text/plain)
2010-06-03 20:22 UTC, Kyle Moffett
Details
Multipart trivial testcase (For -O3) part 2 (20 bytes, text/plain)
2010-06-03 20:22 UTC, Kyle Moffett
Details
Multipart trivial testcase (For -O3) part 3 (132 bytes, text/plain)
2010-06-03 20:22 UTC, Kyle Moffett
Details
Combined trivial testcase objdump result (Built with -O0) (11.41 KB, text/plain)
2010-06-03 20:26 UTC, Kyle Moffett
Details
Multipart trivial testcase objdump result (Built with -O3) (10.73 KB, text/plain)
2010-06-03 20:26 UTC, Kyle Moffett
Details
Updated tc-lossings-floats.c (1.94 KB, text/plain)
2010-06-04 01:37 UTC, Kyle Moffett
Details
Updated tc-lossings-floats.objdump (28.67 KB, text/plain)
2010-06-04 01:37 UTC, Kyle Moffett
Details
Even tinier tc-lossings-floats.c (1.57 KB, text/plain)
2010-06-04 02:17 UTC, Kyle Moffett
Details
Even tinier tc-lossings-floats.objdump (28.63 KB, text/plain)
2010-06-04 02:17 UTC, Kyle Moffett
Details
Minimal test with -O1 (983 bytes, text/plain)
2010-06-04 17:24 UTC, Kyle Moffett
Details
Minimal test objdump with -O1 (25.71 KB, text/plain)
2010-06-04 17:24 UTC, Kyle Moffett
Details
this fails to compile in -O2 with the fix (35.40 KB, text/plain)
2010-06-09 07:50 UTC, Sebastian Andrzej Siewior
Details
e500.h and caller-save.c patch (1.46 KB, patch)
2010-06-09 13:26 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Andrzej Siewior 2010-06-01 13:41:51 UTC
The testcase (which is stripped down perl code) attached compiled with:
gcc -fPIC -fno-strict-aliasing -pipe \
         -O2 \
         -g -o tc-lossings-floats tc-lossings-floats.c -Wall -mno-isel

results in after executing:
|RESET: 252.000000 | 0.000000
|RESET: 504.000000 | 252.000000
|RESET: 756.000000 | 504.000000
|RESET: 1008.000000 | 756.000000
|RESET: 1260.000000 | 1008.000000
|RESET: 1512.000000 | 1260.000000
|RESET: 1764.000000 | 1512.000000
|RESET: 2016.000000 | 1764.000000
|RESET: 2268.000000 | 2016.000000
 => 2268.000000
|RESET: 2520.000000 | 0.000000
|RESET: 2772.000000 | 2520.000000

With -O1 instead -O2:
|RESET: 252.000000 | 0.000000
|RESET: 504.000000 | 252.000000
|RESET: 756.000000 | 504.000000
|RESET: 1008.000000 | 756.000000
|RESET: 1260.000000 | 1008.000000
|RESET: 1512.000000 | 1260.000000
|RESET: 1764.000000 | 1512.000000
|RESET: 2016.000000 | 1764.000000
|RESET: 2268.000000 | 2016.000000
 => 2268.000000
|RESET: 2520.000000 | 2268.000000
|RESET: 2772.000000 | 2520.000000

The "=>" line sets the the second value in the "|RESET" line. With -O1 is remains where it is, with -O2 it gets overwritten. The original perl code gets here a totally random values. 

Now. It is getting better. The source compiled with -S and the resulting assembly file assembled with gcc-4.3 does not show this problem. After diffing of the two resulting binaries I saw  a difference in __floatdidf() which is called from Kino_OutStream_tell(). The variant which is attached by the 4.3 compiler does not use floating point instead it uses integer code which calls other functions (__floatsidf, __muldf3, __floatunsidf, __adddf3) which use also interger code. The 4.3 compiler was not compiled with --enable-e500_double. 

So after looking at the code I saw now the following:
10000c24 <__floatdidf>:
10000c6c:       11 23 1a 2c     evmergehi r9,r3,r3

This function is touching the complete 64bit r9 register
The code which calls it:

10000a40:       91 21 00 20     stw     r9,32(r1)
10000a44:       4e 80 04 21     bctrl   # tell function which in turn calls floatdidf
 
10000a7c:       81 21 00 20     lwz     r9,32(r1)
10000a80:       38 60 00 00     li      r3,0    
10000a84:       7e 33 8b 78     mr      r19,r17 
10000a88:       12 49 92 e1     efdsub  r18,r9,r18
10000a8c:       10 80 92 fa     efdctsiz r4,r18
10000a90:       12 49 4a 17     evmr    r18,r9  

So we just save the lower 32bit of r9 before calling the function and the upper 32bit are overwritten by efdsub.
Comment 1 Sebastian Andrzej Siewior 2010-06-01 13:42:33 UTC
Created attachment 20793 [details]
test case
Comment 2 Sebastian Andrzej Siewior 2010-06-01 13:42:57 UTC
Created attachment 20794 [details]
objdump of the compiled testcase
Comment 3 Kyle Moffett 2010-06-03 19:26:33 UTC
(In reply to comment #0)
> So after looking at the code I saw now the following:
> 10000c24 <__floatdidf>:
> 10000c6c:       11 23 1a 2c     evmergehi r9,r3,r3
> 
> This function is touching the complete 64bit r9 register
> The code which calls it:
> 
> 10000a40:       91 21 00 20     stw     r9,32(r1)
> 10000a44:       4e 80 04 21     bctrl   # tell function which in turn calls
> floatdidf
> 
> 10000a7c:       81 21 00 20     lwz     r9,32(r1)
> 10000a80:       38 60 00 00     li      r3,0    
> 10000a84:       7e 33 8b 78     mr      r19,r17 
> 10000a88:       12 49 92 e1     efdsub  r18,r9,r18
> 10000a8c:       10 80 92 fa     efdctsiz r4,r18
> 10000a90:       12 49 4a 17     evmr    r18,r9  
> 
> So we just save the lower 32bit of r9 before calling the function and the upper
> 32bit are overwritten by efdsub.

Ok, here's the e500 ABI user guide:
  http://www.freescale.com/files/32bit/doc/ref_manual/E500ABIUG.pdf

According to that guide, we have the following usage of GPRs:
  Dedicated: r1 r2 r13
  Nonvolatile: r14-r31
  Volatile: r0 r3-r12

Then:
  Before a function changes the value in the upper word of any nonvolatile general register, rn, it shall save the 64-bit value in rn in the 64-bit general register save area 8*(32-n) bytes below the CR save area (plus any required padding).  The 64-bit general save area shall have 8-byte alignment.

And:
  Before a function changes the value in the lower word of any nonvolatile general register, rn, that has not already been saved in the 64-bit general register save area, it shall save the value in the lower word of rn in the word in the 32-bit general register save area 4*(32-n) bytes before the back chain word of the previous frame.

So clearly the caller's assembly is wrong; it should be saving all 64-bits of r9 (volatile gpr) first.

Cheers,
Kyle Moffett

Comment 4 Kyle Moffett 2010-06-03 20:09:12 UTC
Ok, I have a trivial 19-line testcase that triggers the bug on my native Debian GCC 4.4.4-2+powerpcspe1 (with PR44169 fix) with -O0 and -O3.  The compiler was built with: --with-cpu=8548 --enable-e500_double --with-long-double-128
  In order to make it work on -O3 I had to split the "double z", "void problem()" and "int main()" into 3 separate files to prevent GCC from inlining everything.

Compiled with:
  gcc -Wall -Wextra -Werror -ggdb3 -O0 -o testall testall.c
Output (the two "main: a" lines should read the same):
  main: a = 78.463000
  problem: a = 39.231500
  main: a = 39.231476

Or with:
  gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test.o -c test.c
  gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test2.o -c test2.c
  gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test3.o -c test3.c
  gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test test.o test2.o test3.o
Output:  (again, "main" lines should be identical)
  main: a = 78.463000
  problem: a = 39.231500
  main: a = 78.462952

As you might imagine, a bug this bad triggers testsuite failures in MPFR, GMP, PostgreSQL, lapack, and basically everything else with a comprehensive floating-point test-suite.

Cheers,
Kyle Moffett
Comment 5 Sebastian Andrzej Siewior 2010-06-03 20:17:28 UTC
>So clearly the caller's assembly is wrong; it should be saving all 64-bits of
>r9 (volatile gpr) first.

Yes, that it what I've been pointing out. There is an optimization in the stack code which uses 32bit stores/loads if no double types are used.
I looks to me like the functions branch and register saves are done by the RTL and it does not know about the 64bit GPRs here. This is only a guess, I'm not a gcc guy.
Comment 6 Kyle Moffett 2010-06-03 20:21:32 UTC
Created attachment 20823 [details]
Combined trivial testcase (For -O0)
Comment 7 Kyle Moffett 2010-06-03 20:22:01 UTC
Created attachment 20824 [details]
Multipart trivial testcase (For -O3)
Comment 8 Kyle Moffett 2010-06-03 20:22:32 UTC
Created attachment 20825 [details]
Multipart trivial testcase (For -O3) part 2
Comment 9 Kyle Moffett 2010-06-03 20:22:51 UTC
Created attachment 20826 [details]
Multipart trivial testcase (For -O3) part 3
Comment 10 Kyle Moffett 2010-06-03 20:26:21 UTC
Created attachment 20827 [details]
Combined trivial testcase objdump result (Built with -O0)
Comment 11 Kyle Moffett 2010-06-03 20:26:47 UTC
Created attachment 20828 [details]
Multipart trivial testcase objdump result (Built with -O3)
Comment 12 Kyle Moffett 2010-06-03 23:18:02 UTC
Comment on attachment 20823 [details]
Combined trivial testcase (For -O0)

Scratch my test cases... "register asm(...)" doesn't work the way I thought it did... Sebastian's test case is the only one that I've found that works.
Comment 13 Kyle Moffett 2010-06-04 01:37:09 UTC
Created attachment 20829 [details]
Updated tc-lossings-floats.c

I sat down with GCC and vim for a couple hours narrowing down Sebastian's testcase.  This file of ~100 lines exhibits the exact same problem as the original testcase.
Comment 14 Kyle Moffett 2010-06-04 01:37:39 UTC
Created attachment 20830 [details]
Updated tc-lossings-floats.objdump
Comment 15 Kyle Moffett 2010-06-04 02:17:05 UTC
Created attachment 20831 [details]
Even tinier tc-lossings-floats.c

Spent a bit more time on the testcase and made it even smaller.  The bug goes away if I make any of the following changes:

  (1) Make "counter" a nonstatic local variable or "iter" a static global
  (2) Call dotell() directly instead of through a function pointer
  (3) Eliminate the references to frq_out->buf_start or prx_out->buf_start, even though they are always zero.
Comment 16 Kyle Moffett 2010-06-04 02:17:37 UTC
Created attachment 20832 [details]
Even tinier tc-lossings-floats.objdump
Comment 17 Kyle Moffett 2010-06-04 17:24:14 UTC
Created attachment 20840 [details]
Minimal test with -O1

I've managed to shrink this down to a 44-line testcase that fails with the options listed below.  The code is the same but the failure mode is different;  Specifically, I get this output:
|RESET: 252.000000 | 4294967296.000000
|RESET: 504.000000 | 4294967296.000000
|RESET: 756.000000 | 4294967296.000000
|RESET: 1008.000000 | 4294967296.000000
|RESET: 1260.000000 | 4294967296.000000
|RESET: 1512.000000 | 4294967296.000000
|RESET: 1764.000000 | 4294967296.000000
|RESET: 2016.000000 | 4294967296.000000
|RESET: 2268.000000 | 4294967296.000000
 => 2268.000000
|RESET: 2520.000000 | 4294967296.000000
|RESET: 2772.000000 | 4294967296.000000

From the code, the second floating point value on each line should be equal to the first float from the previous line.  If I change the first "(unsigned long long)" cast to a "(long long)" cast, the bug goes away and I see the following (correct) output:
|RESET: 252.000000 | 0.000000
|RESET: 504.000000 | 252.000000
|RESET: 756.000000 | 504.000000
|RESET: 1008.000000 | 756.000000
|RESET: 1260.000000 | 1008.000000
|RESET: 1512.000000 | 1260.000000
|RESET: 1764.000000 | 1512.000000
|RESET: 2016.000000 | 1764.000000
|RESET: 2268.000000 | 2016.000000
 => 2268.000000
|RESET: 2520.000000 | 2268.000000
|RESET: 2772.000000 | 2520.000000

The full list of compile options:
	-pipe \
	-g \
	-Wall \
	-fmove-loop-invariants \
	-fcaller-saves \
	--param max-aliased-vops=100 \
	--param avg-aliased-vops=1 \
	--param max-fields-for-field-sensitive=0 \
	--param loop-invariant-max-bbs-in-loop=1000 \
	-O \
	-fno-web \
	-fno-rename-registers \
	-fno-unroll-loops \
	-fno-peel-loops \
	-fno-split-ivs-in-unroller \
	-fno-peephole \
	-fno-dce \
	-fno-dse \
	-fno-unit-at-a-time \
	-fno-omit-frame-pointer \
	-fno-auto-inc-dec \
	-fno-cprop-registers \
	-fno-dce \
	-fno-defer-pop \
	-fno-delayed-branch \
	-fno-dse \
	-fno-guess-branch-probability \
	-fno-if-conversion2 \
	-fno-if-conversion \
	-fno-inline-small-functions \
	-fno-inline-functions-called-once \
	-fno-early-inlining \
	-fno-ipa-pure-const \
	-fno-ipa-reference \
	-fno-merge-constants \
	-fno-split-wide-types \
	-fno-tree-builtin-call-dce \
	-fno-tree-ccp \
	-fno-tree-ch \
	-fno-tree-copyrename \
	-fno-tree-dce \
	-fno-tree-dominator-opts \
	-fno-tree-dse \
	-fno-tree-fre \
	-fno-tree-sra \
	-fno-tree-ter \
	-fno-merge-constants \
	-fno-branch-count-reg \
	-fno-function-cse \
	-fno-thread-jumps \
	-fno-gcse \
	-fno-rerun-cse-after-loop \
	-fno-cse-skip-blocks \
	-fno-cse-follow-jumps
Comment 18 Kyle Moffett 2010-06-04 17:24:47 UTC
Created attachment 20841 [details]
Minimal test objdump with -O1
Comment 19 Alan Modra 2010-06-06 14:11:46 UTC
Confirmed.

Regarding O1test.c:
Wierd set of gcc options, particularly -fno-dce and -fcaller-saves.  I can't see any sane reason why you would use those options on powerpc, unless you were deliberately stress testing gcc..  It's quite possible that your O1test.c is tickling an entirely different bug than in the original testcase.  Trimming the options a bit, I can reproduce the O1test.c problem with only
-fcaller-saves -O -fno-omit-frame-pointer -fno-dce -fno-split-wide-types
Comment 20 Alan Modra 2010-06-06 14:52:31 UTC
My guess is that tc-lossings-floats.c hits an ira related problem, but I'm not particularly familiar with that area of the compiler so won't look further myself.
Comment 21 Kyle Moffett 2010-06-06 15:37:35 UTC
I was trying to strip down the testcase and try to see which optimization flags caused it.  I started from -O2 and tried to see which -O2 flags (in addition to O1) were needed to cause the problem.  From there I simplified the testcase as small as I could get it and turned off as many other optimizations as possible.  As far as I can tell, it seems to be a combination of -fcaller-saves and the minimal CSE that -O1 always turns on (regardless of other -fno-* options).

Part of the reason for the extra -fno-* compiler options was to prevent the compiler from moving registers around so much as I fiddled with the testcase, since the bug is only triggered when it allocates one of the 2 doubles in a register that is reused by one of the libgcc functions it calls.

Other than that, I'm just grasping at straws...

If somebody can debug this and fix both testcases before FreeScale gets back to us with a fix I'll ship them a case or two of their favorite beer (or whatever other beverage they'd prefer).
Comment 22 Alan Modra 2010-06-07 04:41:52 UTC
Adding the following to config/rs6000/e500.h will likely fix the bug.  Testing..

#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
  (TARGET_E500_DOUBLE && ((MODE) == DFmode || (MODE) == TFmode		\
			  || (MODE) == DCmode || (MODE) == TCmode)	\
   ? (MODE)								\
   : choose_hard_reg_mode ((REGNO), (NREGS), false))
Comment 23 Kyle Moffett 2010-06-07 05:36:36 UTC
(In reply to comment #22)
> Adding the following to config/rs6000/e500.h will likely fix the bug. 
> Testing..

Oh, very nice!  Thanks for the speedy assistance on this!  I've got my own test compiler building now; I'll let you know what the result is as soon as its done!

Cheers,
Kyle Moffett
Comment 24 Kyle Moffett 2010-06-07 05:44:19 UTC
Hmm, unfortunately in my preliminary testing this does not seem to fix either testcase.

Cheers,
Kyle Moffett

Comment 25 Alan Modra 2010-06-07 09:53:34 UTC
Yes it seems the patch is not sufficient on 4.4.  On mainline the code looks good by inspection.  (I don't have e500 hardware to run tests on.)
Comment 26 Alan Modra 2010-06-07 10:29:57 UTC
Doh!  No, it's still broken on mainline too.  I wasn't testing what I thought I was...
Comment 27 Kyle Moffett 2010-06-07 12:49:07 UTC
(In reply to comment #25)
> Yes it seems the patch is not sufficient on 4.4.  On mainline the code looks
> good by inspection.  (I don't have e500 hardware to run tests on.)

If you'd like login access to one of our protoboards, just post an SSH public key on bugzilla and we'll set up port-forwarded access to one of our Debian buildd servers.  We're thankful for all the assistance you've been able to provide.

Comment 28 Alan Modra 2010-06-07 17:05:39 UTC
Please bootstrap and test this addition to e500.h

/* When setting up caller-save slots (MODE == VOIDmode) ensure we
   allocate space for DFmode.  Save gprs in the correct mode too.  */
#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
  (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode)	\
   ? DFmode								\
   : choose_hard_reg_mode ((REGNO), (NREGS), false))
Comment 29 Kyle Moffett 2010-06-07 18:28:30 UTC
Awesome!!! Both of our testcases that were failing pass with this patch applied!

I'm not going to call it a 100% victory yet, I want to rebuild our native compilers and build-and-run the PostgreSQL, GMP and MPFR testsuites first.

Here's hoping though! *crosses fingers*
Comment 30 Kyle Moffett 2010-06-07 18:56:09 UTC
Ok, the cross-compiler built with this patch fails to build a native GCC for the target with the following ICE:

../../../src/libgcc/../libdecnumber/decLibrary.c: In function 'isinfd128':
../../../src/libgcc/../libdecnumber/decLibrary.c:71: error: unrecognizable insn: (insn 6 5 7 2 ../../../src/libgcc/../libdecnumber/decLibrary.c:68 (set (subreg:TI (reg:TD 122 [ arg ]) 0) (reg:TI 123)) -1 (nil))
../../../src/libgcc/../libdecnumber/decLibrary.c:71: internal compiler error: in extract_insn, at recog.c:2001
Comment 31 Kyle Moffett 2010-06-08 04:24:07 UTC
Alan,

Now that I've corrected all of my compiler issues, your first patch (the one quoted below) works like a charm.  I still need to do the extensive archive-rebuild and testsuite run to verify that it completely works, but that will take a couple days.

Looking good!

Cheers,
Kyle Moffett

(In reply to comment #28)
> /* When setting up caller-save slots (MODE == VOIDmode) ensure we
>    allocate space for DFmode.  Save gprs in the correct mode too.  */
> #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
>   (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode)       \
>    ? DFmode                                                             \
>    : choose_hard_reg_mode ((REGNO), (NREGS), false))
Comment 32 Kyle Moffett 2010-06-08 05:07:47 UTC
The first working patch (VOIDmode||DFmode) just successfully passed full GMP and MPFR testsuites on the e500 boards.  PPL and EGLIBC are currently rebuilding.
Comment 33 Kyle Moffett 2010-06-08 06:48:04 UTC
EGLIBC and PPL are still building; I'm heading to sleep and I'll check on them later this morning.
Comment 34 Sebastian Andrzej Siewior 2010-06-08 11:23:48 UTC
(In reply to comment #28)
> #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
>   (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode)       \
>    ? DFmode                                                             \
>    : choose_hard_reg_mode ((REGNO), (NREGS), false))
> 
I applied it on 4.4 branch. The big Perl thing is passes its testsuite with this.
Also I don't see evldd X, stw X, ld X constructs, there is just one evldd. It 
looks good so far.
Comment 35 Kyle Moffett 2010-06-08 15:23:44 UTC
Hrm, well PPL still seems to be failing the "interval1" test, but I'm not sure that one's related as the part that fails is "test01<float>".  More info to come shortly.
Comment 36 Kyle Moffett 2010-06-08 20:34:19 UTC
Ok, I'm pretty sure this is unrelated to this bug, but I still get one test-failure with PPL 0.10.2.  The "interval1" test fails due to the "test01<float>" subtest, apparently due to very slightly excessive "float" rounding errors (after 100 rounds of the equivalent of "x = (x + (2/x)) / 2" it approximates ~0.02% past the maximum acceptable error bounds).

The "double" testcase passes just fine (although with the same error bounds and extra precision that seems perfectly reasonable).
Comment 37 Sebastian Andrzej Siewior 2010-06-09 07:50:54 UTC
Created attachment 20873 [details]
this fails to compile in -O2 with the fix
Comment 38 Sebastian Andrzej Siewior 2010-06-09 07:54:00 UTC
(In reply to comment #28)
> Please bootstrap and test this addition to e500.h
> 
> /* When setting up caller-save slots (MODE == VOIDmode) ensure we
>    allocate space for DFmode.  Save gprs in the correct mode too.  */
> #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
>   (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode)       \
>    ? DFmode                                                             \
>    : choose_hard_reg_mode ((REGNO), (NREGS), false))
> 

Okay. Now I found something: 

inst/bin/powerpc-linux-gnuspe-gcc extract_chmLib.i -o extract_chmLib.S -S -O2
extract_chmLib.c: In function '_extract_callback':
extract_chmLib.c:29: internal compiler error: in change_address_1, at emit-rtl.c:1954
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

extract_chmLib.i is attached. Adding -mfloat-gprs=single which avoids using 64bit gprs for double makes this go away.
Comment 39 Harry He 2010-06-09 08:59:29 UTC
Hi,  Kyle Moffett,
    in testall.c, r9 is used by a register variable, however, in E500ABI guide, r9 should be used for parameter passing, this test case seems not reasonable.

Harry He
Comment 40 Harry He 2010-06-09 09:23:26 UTC
with my toolchain (From CodeSourcery, 4.4-78), o1test gives correct behavior with built-in flags(-te500v2), but wrong behaviors with "-fcaller-saves -O -fno-omit-frame-pointer -fno-dce -fno-split-wide-types". Results are same even after I rebuilt the toolchain with the patch to e500.h.

is there any tricks here? 
Comment 41 Alan Modra 2010-06-09 13:26:19 UTC
Created attachment 20877 [details]
e500.h and caller-save.c patch

The ICE in #38 is due to a bug in caller-save.c
Comment 42 Sebastian Andrzej Siewior 2010-06-09 13:52:48 UTC
(In reply to comment #41)
> The ICE in #38 is due to a bug in caller-save.c

Thank you for the very quick fix.
Comment 43 Mark Workman 2010-06-09 14:07:11 UTC
(In reply to comment #39)
> Hi,  Kyle Moffett,
>     in testall.c, r9 is used by a register variable, however, in E500ABI guide,
> r9 should be used for parameter passing, this test case seems not reasonable.
> 
> Harry He

Please note that this testcase was removed by Kyle (see comment #12) and that neither the original testcase (tc-lossings-float.c) nor the trimmed testcase (tc-lossings-float-3.c) make such explicit use of particular registers.  Thus, it does appear that it is the compiler that is making the register assignments in question.

Cheers,
  Mark

> 

Comment 44 Harry He 2010-06-10 09:03:21 UTC
Thanks for your reminding, Mark.
Comment 45 Mark Workman 2010-06-10 17:43:15 UTC
(In reply to comment #40)
> with my toolchain (From CodeSourcery, 4.4-78), o1test gives correct behavior
> with built-in flags(-te500v2), but wrong behaviors with "-fcaller-saves -O
> -fno-omit-frame-pointer -fno-dce -fno-split-wide-types". Results are same even
> after I rebuilt the toolchain with the patch to e500.h.
> 
> is there any tricks here? 
> 

From what I can tell of CodeSourcery 4.4-78, it contains a heavily patched fork of gcc 4.4.1.  This bug is reported against gcc 4.4.4.  Harry, are you saying you believe this is a regression of gcc from 4.4.1 to 4.4.4?  Or does CodeSourcery have a (perhaps different) patch for this issue that should be submitted here?

Regards,
  Mark
Comment 46 Harry He 2010-06-11 01:09:20 UTC
Mark, I have reported the issue to CodeSourcery and waiting them to confirm, I guess our toolchain may need different patches for it, or there is something wrong in my validation.
Comment 47 Kyle Moffett 2010-06-21 15:55:51 UTC
(In reply to comment #41)
> Created an attachment (id=20877) [edit]
> e500.h and caller-save.c patch
> 
> The ICE in #38 is due to a bug in caller-save.c

We've basically rebuilt everything in the PowerPCSPE Debian port now, and as far as I can tell all of the test-cases affected by this bug are fixed by your attachment 20877 [details] (pr44364.diff).  Now that we've gotten further, Sebastian has identified a new floating point bug (PR44606), but I believe we can call this one closed.

Again, many thanks for all your help!

Cheers,
Kyle Moffett
Comment 48 Alan Modra 2010-06-22 10:44:33 UTC
Subject: Bug 44364

Author: amodra
Date: Tue Jun 22 10:44:11 2010
New Revision: 161163

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161163
Log:
	PR target/44364
	* config/rs6000/e500.h (HARD_REGNO_CALLER_SAVE_MODE): Define.
	* caller-save.c (insert_restore, insert_save): Use non-validate
	form of adjust_address.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/caller-save.c
    trunk/gcc/config/rs6000/e500.h

Comment 49 Sebastian Andrzej Siewior 2010-06-28 11:18:53 UTC
>Modified:
>    trunk/gcc/ChangeLog
>    trunk/gcc/caller-save.c
>    trunk/gcc/config/rs6000/e500.h

Is it possible to get this into the 4.4 and 4.5 branch as well?
Comment 50 Alan Modra 2013-02-07 01:44:07 UTC
Fixed all currently active branches.