Description
Sebastian Andrzej Siewior
2010-06-01 13:41:51 UTC
Created attachment 20793 [details]
test case
Created attachment 20794 [details]
objdump of the compiled testcase
(In reply to comment #0) > So after looking at the code I saw now the following: > 10000c24 <__floatdidf>: > 10000c6c: 11 23 1a 2c evmergehi r9,r3,r3 > > This function is touching the complete 64bit r9 register > The code which calls it: > > 10000a40: 91 21 00 20 stw r9,32(r1) > 10000a44: 4e 80 04 21 bctrl # tell function which in turn calls > floatdidf > > 10000a7c: 81 21 00 20 lwz r9,32(r1) > 10000a80: 38 60 00 00 li r3,0 > 10000a84: 7e 33 8b 78 mr r19,r17 > 10000a88: 12 49 92 e1 efdsub r18,r9,r18 > 10000a8c: 10 80 92 fa efdctsiz r4,r18 > 10000a90: 12 49 4a 17 evmr r18,r9 > > So we just save the lower 32bit of r9 before calling the function and the upper > 32bit are overwritten by efdsub. Ok, here's the e500 ABI user guide: http://www.freescale.com/files/32bit/doc/ref_manual/E500ABIUG.pdf According to that guide, we have the following usage of GPRs: Dedicated: r1 r2 r13 Nonvolatile: r14-r31 Volatile: r0 r3-r12 Then: Before a function changes the value in the upper word of any nonvolatile general register, rn, it shall save the 64-bit value in rn in the 64-bit general register save area 8*(32-n) bytes below the CR save area (plus any required padding). The 64-bit general save area shall have 8-byte alignment. And: Before a function changes the value in the lower word of any nonvolatile general register, rn, that has not already been saved in the 64-bit general register save area, it shall save the value in the lower word of rn in the word in the 32-bit general register save area 4*(32-n) bytes before the back chain word of the previous frame. So clearly the caller's assembly is wrong; it should be saving all 64-bits of r9 (volatile gpr) first. Cheers, Kyle Moffett Ok, I have a trivial 19-line testcase that triggers the bug on my native Debian GCC 4.4.4-2+powerpcspe1 (with PR44169 fix) with -O0 and -O3. The compiler was built with: --with-cpu=8548 --enable-e500_double --with-long-double-128 In order to make it work on -O3 I had to split the "double z", "void problem()" and "int main()" into 3 separate files to prevent GCC from inlining everything. Compiled with: gcc -Wall -Wextra -Werror -ggdb3 -O0 -o testall testall.c Output (the two "main: a" lines should read the same): main: a = 78.463000 problem: a = 39.231500 main: a = 39.231476 Or with: gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test.o -c test.c gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test2.o -c test2.c gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test3.o -c test3.c gcc -Wall -Wextra -Werror -ggdb3 -O3 -o test test.o test2.o test3.o Output: (again, "main" lines should be identical) main: a = 78.463000 problem: a = 39.231500 main: a = 78.462952 As you might imagine, a bug this bad triggers testsuite failures in MPFR, GMP, PostgreSQL, lapack, and basically everything else with a comprehensive floating-point test-suite. Cheers, Kyle Moffett >So clearly the caller's assembly is wrong; it should be saving all 64-bits of
>r9 (volatile gpr) first.
Yes, that it what I've been pointing out. There is an optimization in the stack code which uses 32bit stores/loads if no double types are used.
I looks to me like the functions branch and register saves are done by the RTL and it does not know about the 64bit GPRs here. This is only a guess, I'm not a gcc guy.
Created attachment 20823 [details]
Combined trivial testcase (For -O0)
Created attachment 20824 [details]
Multipart trivial testcase (For -O3)
Created attachment 20825 [details]
Multipart trivial testcase (For -O3) part 2
Created attachment 20826 [details]
Multipart trivial testcase (For -O3) part 3
Created attachment 20827 [details]
Combined trivial testcase objdump result (Built with -O0)
Created attachment 20828 [details]
Multipart trivial testcase objdump result (Built with -O3)
Comment on attachment 20823 [details]
Combined trivial testcase (For -O0)
Scratch my test cases... "register asm(...)" doesn't work the way I thought it did... Sebastian's test case is the only one that I've found that works.
Created attachment 20829 [details]
Updated tc-lossings-floats.c
I sat down with GCC and vim for a couple hours narrowing down Sebastian's testcase. This file of ~100 lines exhibits the exact same problem as the original testcase.
Created attachment 20830 [details]
Updated tc-lossings-floats.objdump
Created attachment 20831 [details]
Even tinier tc-lossings-floats.c
Spent a bit more time on the testcase and made it even smaller. The bug goes away if I make any of the following changes:
(1) Make "counter" a nonstatic local variable or "iter" a static global
(2) Call dotell() directly instead of through a function pointer
(3) Eliminate the references to frq_out->buf_start or prx_out->buf_start, even though they are always zero.
Created attachment 20832 [details]
Even tinier tc-lossings-floats.objdump
Created attachment 20840 [details]
Minimal test with -O1
I've managed to shrink this down to a 44-line testcase that fails with the options listed below. The code is the same but the failure mode is different; Specifically, I get this output:
|RESET: 252.000000 | 4294967296.000000
|RESET: 504.000000 | 4294967296.000000
|RESET: 756.000000 | 4294967296.000000
|RESET: 1008.000000 | 4294967296.000000
|RESET: 1260.000000 | 4294967296.000000
|RESET: 1512.000000 | 4294967296.000000
|RESET: 1764.000000 | 4294967296.000000
|RESET: 2016.000000 | 4294967296.000000
|RESET: 2268.000000 | 4294967296.000000
=> 2268.000000
|RESET: 2520.000000 | 4294967296.000000
|RESET: 2772.000000 | 4294967296.000000
From the code, the second floating point value on each line should be equal to the first float from the previous line. If I change the first "(unsigned long long)" cast to a "(long long)" cast, the bug goes away and I see the following (correct) output:
|RESET: 252.000000 | 0.000000
|RESET: 504.000000 | 252.000000
|RESET: 756.000000 | 504.000000
|RESET: 1008.000000 | 756.000000
|RESET: 1260.000000 | 1008.000000
|RESET: 1512.000000 | 1260.000000
|RESET: 1764.000000 | 1512.000000
|RESET: 2016.000000 | 1764.000000
|RESET: 2268.000000 | 2016.000000
=> 2268.000000
|RESET: 2520.000000 | 2268.000000
|RESET: 2772.000000 | 2520.000000
The full list of compile options:
-pipe \
-g \
-Wall \
-fmove-loop-invariants \
-fcaller-saves \
--param max-aliased-vops=100 \
--param avg-aliased-vops=1 \
--param max-fields-for-field-sensitive=0 \
--param loop-invariant-max-bbs-in-loop=1000 \
-O \
-fno-web \
-fno-rename-registers \
-fno-unroll-loops \
-fno-peel-loops \
-fno-split-ivs-in-unroller \
-fno-peephole \
-fno-dce \
-fno-dse \
-fno-unit-at-a-time \
-fno-omit-frame-pointer \
-fno-auto-inc-dec \
-fno-cprop-registers \
-fno-dce \
-fno-defer-pop \
-fno-delayed-branch \
-fno-dse \
-fno-guess-branch-probability \
-fno-if-conversion2 \
-fno-if-conversion \
-fno-inline-small-functions \
-fno-inline-functions-called-once \
-fno-early-inlining \
-fno-ipa-pure-const \
-fno-ipa-reference \
-fno-merge-constants \
-fno-split-wide-types \
-fno-tree-builtin-call-dce \
-fno-tree-ccp \
-fno-tree-ch \
-fno-tree-copyrename \
-fno-tree-dce \
-fno-tree-dominator-opts \
-fno-tree-dse \
-fno-tree-fre \
-fno-tree-sra \
-fno-tree-ter \
-fno-merge-constants \
-fno-branch-count-reg \
-fno-function-cse \
-fno-thread-jumps \
-fno-gcse \
-fno-rerun-cse-after-loop \
-fno-cse-skip-blocks \
-fno-cse-follow-jumps
Created attachment 20841 [details]
Minimal test objdump with -O1
Confirmed. Regarding O1test.c: Wierd set of gcc options, particularly -fno-dce and -fcaller-saves. I can't see any sane reason why you would use those options on powerpc, unless you were deliberately stress testing gcc.. It's quite possible that your O1test.c is tickling an entirely different bug than in the original testcase. Trimming the options a bit, I can reproduce the O1test.c problem with only -fcaller-saves -O -fno-omit-frame-pointer -fno-dce -fno-split-wide-types My guess is that tc-lossings-floats.c hits an ira related problem, but I'm not particularly familiar with that area of the compiler so won't look further myself. I was trying to strip down the testcase and try to see which optimization flags caused it. I started from -O2 and tried to see which -O2 flags (in addition to O1) were needed to cause the problem. From there I simplified the testcase as small as I could get it and turned off as many other optimizations as possible. As far as I can tell, it seems to be a combination of -fcaller-saves and the minimal CSE that -O1 always turns on (regardless of other -fno-* options). Part of the reason for the extra -fno-* compiler options was to prevent the compiler from moving registers around so much as I fiddled with the testcase, since the bug is only triggered when it allocates one of the 2 doubles in a register that is reused by one of the libgcc functions it calls. Other than that, I'm just grasping at straws... If somebody can debug this and fix both testcases before FreeScale gets back to us with a fix I'll ship them a case or two of their favorite beer (or whatever other beverage they'd prefer). Adding the following to config/rs6000/e500.h will likely fix the bug. Testing.. #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ (TARGET_E500_DOUBLE && ((MODE) == DFmode || (MODE) == TFmode \ || (MODE) == DCmode || (MODE) == TCmode) \ ? (MODE) \ : choose_hard_reg_mode ((REGNO), (NREGS), false)) (In reply to comment #22) > Adding the following to config/rs6000/e500.h will likely fix the bug. > Testing.. Oh, very nice! Thanks for the speedy assistance on this! I've got my own test compiler building now; I'll let you know what the result is as soon as its done! Cheers, Kyle Moffett Hmm, unfortunately in my preliminary testing this does not seem to fix either testcase. Cheers, Kyle Moffett Yes it seems the patch is not sufficient on 4.4. On mainline the code looks good by inspection. (I don't have e500 hardware to run tests on.) Doh! No, it's still broken on mainline too. I wasn't testing what I thought I was... (In reply to comment #25) > Yes it seems the patch is not sufficient on 4.4. On mainline the code looks > good by inspection. (I don't have e500 hardware to run tests on.) If you'd like login access to one of our protoboards, just post an SSH public key on bugzilla and we'll set up port-forwarded access to one of our Debian buildd servers. We're thankful for all the assistance you've been able to provide. Please bootstrap and test this addition to e500.h /* When setting up caller-save slots (MODE == VOIDmode) ensure we allocate space for DFmode. Save gprs in the correct mode too. */ #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode) \ ? DFmode \ : choose_hard_reg_mode ((REGNO), (NREGS), false)) Awesome!!! Both of our testcases that were failing pass with this patch applied! I'm not going to call it a 100% victory yet, I want to rebuild our native compilers and build-and-run the PostgreSQL, GMP and MPFR testsuites first. Here's hoping though! *crosses fingers* Ok, the cross-compiler built with this patch fails to build a native GCC for the target with the following ICE: ../../../src/libgcc/../libdecnumber/decLibrary.c: In function 'isinfd128': ../../../src/libgcc/../libdecnumber/decLibrary.c:71: error: unrecognizable insn: (insn 6 5 7 2 ../../../src/libgcc/../libdecnumber/decLibrary.c:68 (set (subreg:TI (reg:TD 122 [ arg ]) 0) (reg:TI 123)) -1 (nil)) ../../../src/libgcc/../libdecnumber/decLibrary.c:71: internal compiler error: in extract_insn, at recog.c:2001 Alan, Now that I've corrected all of my compiler issues, your first patch (the one quoted below) works like a charm. I still need to do the extensive archive-rebuild and testsuite run to verify that it completely works, but that will take a couple days. Looking good! Cheers, Kyle Moffett (In reply to comment #28) > /* When setting up caller-save slots (MODE == VOIDmode) ensure we > allocate space for DFmode. Save gprs in the correct mode too. */ > #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ > (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode) \ > ? DFmode \ > : choose_hard_reg_mode ((REGNO), (NREGS), false)) The first working patch (VOIDmode||DFmode) just successfully passed full GMP and MPFR testsuites on the e500 boards. PPL and EGLIBC are currently rebuilding. EGLIBC and PPL are still building; I'm heading to sleep and I'll check on them later this morning. (In reply to comment #28) > #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ > (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode) \ > ? DFmode \ > : choose_hard_reg_mode ((REGNO), (NREGS), false)) > I applied it on 4.4 branch. The big Perl thing is passes its testsuite with this. Also I don't see evldd X, stw X, ld X constructs, there is just one evldd. It looks good so far. Hrm, well PPL still seems to be failing the "interval1" test, but I'm not sure that one's related as the part that fails is "test01<float>". More info to come shortly. Ok, I'm pretty sure this is unrelated to this bug, but I still get one test-failure with PPL 0.10.2. The "interval1" test fails due to the "test01<float>" subtest, apparently due to very slightly excessive "float" rounding errors (after 100 rounds of the equivalent of "x = (x + (2/x)) / 2" it approximates ~0.02% past the maximum acceptable error bounds). The "double" testcase passes just fine (although with the same error bounds and extra precision that seems perfectly reasonable). Created attachment 20873 [details]
this fails to compile in -O2 with the fix
(In reply to comment #28) > Please bootstrap and test this addition to e500.h > > /* When setting up caller-save slots (MODE == VOIDmode) ensure we > allocate space for DFmode. Save gprs in the correct mode too. */ > #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ > (TARGET_E500_DOUBLE && ((MODE) == VOIDmode || (MODE) == DFmode) \ > ? DFmode \ > : choose_hard_reg_mode ((REGNO), (NREGS), false)) > Okay. Now I found something: inst/bin/powerpc-linux-gnuspe-gcc extract_chmLib.i -o extract_chmLib.S -S -O2 extract_chmLib.c: In function '_extract_callback': extract_chmLib.c:29: internal compiler error: in change_address_1, at emit-rtl.c:1954 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. extract_chmLib.i is attached. Adding -mfloat-gprs=single which avoids using 64bit gprs for double makes this go away. Hi, Kyle Moffett, in testall.c, r9 is used by a register variable, however, in E500ABI guide, r9 should be used for parameter passing, this test case seems not reasonable. Harry He with my toolchain (From CodeSourcery, 4.4-78), o1test gives correct behavior with built-in flags(-te500v2), but wrong behaviors with "-fcaller-saves -O -fno-omit-frame-pointer -fno-dce -fno-split-wide-types". Results are same even after I rebuilt the toolchain with the patch to e500.h. is there any tricks here? Created attachment 20877 [details]
e500.h and caller-save.c patch
The ICE in #38 is due to a bug in caller-save.c
(In reply to comment #41) > The ICE in #38 is due to a bug in caller-save.c Thank you for the very quick fix. (In reply to comment #39) > Hi, Kyle Moffett, > in testall.c, r9 is used by a register variable, however, in E500ABI guide, > r9 should be used for parameter passing, this test case seems not reasonable. > > Harry He Please note that this testcase was removed by Kyle (see comment #12) and that neither the original testcase (tc-lossings-float.c) nor the trimmed testcase (tc-lossings-float-3.c) make such explicit use of particular registers. Thus, it does appear that it is the compiler that is making the register assignments in question. Cheers, Mark > Thanks for your reminding, Mark. (In reply to comment #40) > with my toolchain (From CodeSourcery, 4.4-78), o1test gives correct behavior > with built-in flags(-te500v2), but wrong behaviors with "-fcaller-saves -O > -fno-omit-frame-pointer -fno-dce -fno-split-wide-types". Results are same even > after I rebuilt the toolchain with the patch to e500.h. > > is there any tricks here? > From what I can tell of CodeSourcery 4.4-78, it contains a heavily patched fork of gcc 4.4.1. This bug is reported against gcc 4.4.4. Harry, are you saying you believe this is a regression of gcc from 4.4.1 to 4.4.4? Or does CodeSourcery have a (perhaps different) patch for this issue that should be submitted here? Regards, Mark Mark, I have reported the issue to CodeSourcery and waiting them to confirm, I guess our toolchain may need different patches for it, or there is something wrong in my validation. (In reply to comment #41) > Created an attachment (id=20877) [edit] > e500.h and caller-save.c patch > > The ICE in #38 is due to a bug in caller-save.c We've basically rebuilt everything in the PowerPCSPE Debian port now, and as far as I can tell all of the test-cases affected by this bug are fixed by your attachment 20877 [details] (pr44364.diff). Now that we've gotten further, Sebastian has identified a new floating point bug (PR44606), but I believe we can call this one closed. Again, many thanks for all your help! Cheers, Kyle Moffett Subject: Bug 44364 Author: amodra Date: Tue Jun 22 10:44:11 2010 New Revision: 161163 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161163 Log: PR target/44364 * config/rs6000/e500.h (HARD_REGNO_CALLER_SAVE_MODE): Define. * caller-save.c (insert_restore, insert_save): Use non-validate form of adjust_address. Modified: trunk/gcc/ChangeLog trunk/gcc/caller-save.c trunk/gcc/config/rs6000/e500.h >Modified:
> trunk/gcc/ChangeLog
> trunk/gcc/caller-save.c
> trunk/gcc/config/rs6000/e500.h
Is it possible to get this into the 4.4 and 4.5 branch as well?
Fixed all currently active branches. |