GCC Bugzilla – Bug 11386
GNU Emacs 21.3 failed to install using GCC 3.3, but GCC 3.2.3 works.
Last modified: 2003-07-15 01:31:51 UTC
(I am opening a new bug: I have posted a comment to bug 9816 before.) Under UltraSparc Solaris 8, - gcc 3.2.3 could compile emacs 21.3 and the resulting binary could compile emacs lisp libraries and installs fine. - gcc 3.3 could compile emacs 21.3, but the resulting binary failed to compile emacs lisp libraries. During the compilation of one of the lisp files, the resulting binary seg-faulted! This is very similar to what is observed in the now invalid bug report of bug 9816 . So I had to downgrade to GCC 3.2.3. Since the compiler binary is fetched from http://www.sunfreeware.com I don't have exact details regarding the compile options. (However, it seems to be a straight forward configure make bootstrap ... ) Here is the detaild run log under gdb of the resulting binary. --- This is still insufficient information, but I am posting the gdb log to solicit more input from Emacs user community who may have experienced similar problems using the latest GCC 3.3 under sparc solaris and other platforms. Basically, after a binary compile of the emacs program, the emacs installer tries to compile so called emacs lisp programs into its own internal byte code. During the byte compiling, emacs aborts. The final gdb output is as follows: (details below.) |Program received signal SIGSEGV, Segmentation fault. |0x41e90 in __do_global_dtors_aux () |(gdb) #0 0x41e90 in __do_global_dtors_aux () |#1 0x18931c in _fini () |#2 0xfee9bca4 in _exithandle () from /usr/lib/libc.so.1 |#3 0xfef1f87c in exit () from /usr/lib/libc.so.1 |#4 0xd2b20 in Fkill_emacs (arg=0) at emacs.c:1830 |#5 0x137d20 in Ffuncall (nargs=1, args=0xffbee31c) at eval.c:2659 This suggests that a part of emacs gets miscompiled by by GCC 3.3 under ultra sparc solaris 8. Emacs seems to have detected something funny and tries to quit calling Fkill_emacs function, but by that time, something (memory[code/stack] area?) was either broken from the beginning or corrupted during the run, and during the exit processing the program seems to have encountered a fatal segmentation error. Using GCC 3.2.3 under the same OS, the byte compilation succeeds and the installation succeeds. The resulting Emacs is usable as far as I can tell. Atttached is the problematic run, under gdb, of the Emacs binary that gets compiled using gcc 3.3. I run a particular byte compilation that triggered the abort by using a shell script break.sh. It runs the EMACS binary under gdb. $ cat break.sh LC_ALL=C LANG=C gcc -v cd leim EMACSLOADPATH=/home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp export EMACSLOADPATH # ../src/emacs -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv \ --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC; ***** The above backslash was followed by CR (invisible) and ***** this caused the --eval not found error below, but ***** this has nothing to do with the compilation problem. gdb ../src/emacs <<!EOF run -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC where quit EOF $ sh -vx break.sh LC_ALL=C LC_ALL=C LANG=C LANG=C gcc -v + gcc -v Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/3.3/specs Configured with: ../configure --disable-nls --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld Thread model: posix gcc version 3.3 cd leim + cd leim EMACSLOADPATH=/home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp EMACSLOADPATH=/home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp export EMACSLOADPATH + export EMACSLOADPATH # ../src/emacs -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv \ --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC; + --eval (batch-titdic-convert t) -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC break.sh: --eval: not found **** The comment line that ends with "\" was actually **** followed by CR (invisible) as I mentioned above and **** so shell complained that --eval is an unknown command. **** Now we invoke gdb. gdb ../src/emacs <<!EOF run -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC where quit EOF + gdb ../src/emacs run -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC where quit EOF GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... (gdb) Starting program: /home/ishikawa/PACKAGES/emacs-21.3/leim/../src/emacs -batch --no-init-file --no-site-file --multibyte -l /home/ishikawa/PACKAGES/emacs-21.3/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC Converting all tit files in the directory /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/4Corner.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/ARRAY30.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/CCDOSPY.tit to quail-package... Decoding with coding system euc-china... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/ECDICT.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/ETZY.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/PY-b5.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/Punct-b5.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/Punct.tit to quail-package... Decoding with coding system euc-china... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/QJ-b5.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/QJ.tit to quail-package... Decoding with coding system euc-china... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/SW.tit to quail-package... Decoding with coding system euc-china... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/TONEPY.tit to quail-package... Decoding with coding system euc-china... Processing header part... Formatting translation rules... Converting /home/ishikawa/PACKAGES/emacs-21.3/leim/CXTERM-DIC/ZOZY.tit to quail-package... Decoding with coding system cn-big5... Processing header part... Formatting translation rules... Byte-compile the created files by: % emacs -batch -f batch-byte-compile XXX.el Program received signal SIGSEGV, Segmentation fault. 0x41e90 in __do_global_dtors_aux () (gdb) #0 0x41e90 in __do_global_dtors_aux () #1 0x18931c in _fini () #2 0xfee9bca4 in _exithandle () from /usr/lib/libc.so.1 #3 0xfef1f87c in exit () from /usr/lib/libc.so.1 #4 0xd2b20 in Fkill_emacs (arg=0) at emacs.c:1830 #5 0x137d20 in Ffuncall (nargs=1, args=0xffbee31c) at eval.c:2659 #6 0x166714 in Fbyte_code (bytestr=-4267240, vector=4899592, maxdepth=6) at bytecode.c:716 #7 0x138170 in funcall_lambda (fun=1078908320, nargs=1, arg_vector=0xffbee438) ---Type <return> to continue, or q <return> to quit--- at eval.c:2851 #8 0x138020 in apply_lambda (fun=1078908320, args=1, eval_flag=1) at eval.c:2770 #9 0x136c58 in Feval (form=1346736020) at eval.c:2071 #10 0x137d20 in Ffuncall (nargs=1, args=0xffbee66c) at eval.c:2659 #11 0x166714 in Fbyte_code (bytestr=-4266392, vector=2613852, maxdepth=11) at bytecode.c:716 #12 0x138170 in funcall_lambda (fun=1076354016, nargs=1, arg_vector=0xffbee834) at eval.c:2851 #13 0x137c0c in Ffuncall (nargs=1, args=0xffbee830) at eval.c:2716 #14 0x166714 in Fbyte_code (bytestr=-4265936, vector=2604088, maxdepth=5) at bytecode.c:716 #15 0x138170 in funcall_lambda (fun=1076344624, nargs=0, arg_vector=0xffbee9e4) at eval.c:2851 #16 0x137c0c in Ffuncall (nargs=0, args=0xffbee9e0) at eval.c:2716 #17 0x166714 in Fbyte_code (bytestr=-4265504, vector=2599040, maxdepth=5) at bytecode.c:716 #18 0x138170 in funcall_lambda (fun=1076340688, nargs=0, arg_vector=0xffbeeb00) at eval.c:2851 #19 0x138020 in apply_lambda (fun=1076340688, args=0, eval_flag=1) at eval.c:2770 #20 0x136c58 in Feval (form=1345428916) at eval.c:2071 #21 0x135a3c in internal_condition_case (bfun=0xd3da0 <top_level_2>, ---Type <return> to continue, or q <return> to quit--- handlers=271329500, hfun=0xd3a5c <cmd_error>) at eval.c:1267 #22 0xd3df0 in top_level_1 () at keyboard.c:1262 #23 0x135598 in internal_catch (tag=271281828, func=0xd3db8 <top_level_1>, arg=271207428) at eval.c:1030 #24 0xd3d04 in command_loop () at keyboard.c:1223 #25 0xd37a8 in recursive_edit_1 () at keyboard.c:950 #26 0xd390c in Frecursive_edit () at keyboard.c:1006 #27 0xd2690 in main (argc=0, argv=0xffbef104, envp=0xffbef138) at emacs.c:1547 (gdb) (gdb) (gdb) $ $ exit
I can't really investigate without a reduced testcase that exhibits the miscompilation. However, I already fixed a miscompilation of GNU tar on 32-bit Ultrasparc that was a regression from GCC 3.2.3 too, so you might want to check if the latest 3.3.1 snapshot still has the problem. Thanks.
Hi, I tried the gcc-3.3-branch. Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs Configured with: /home/ishikawa/PACKAGES/gcc-3.3-cvs/gcc/configure --enable-languages=c --program-suffix=-3.3-branch : (reconfigured) /home/ishikawa/PACKAGES/gcc-3.3-cvs/gcc/configure --enable-languages=c --program-suffix=-3.3-branch : (reconfigured) /home/ishikawa/PACKAGES/gcc-3.3-cvs/gcc/configure --enable-languages=c --program-suffix=-3.3-branch : (reconfigured) /home/ishikawa/PACKAGES/gcc-3.3-cvs/gcc/configure --enable-languages=c --program-suffix=-3.3-branch Thread model: posix gcc version 3.3.1 20030701 (prerelease) But still no go. The compilation of gnu emacs lisp file aborts: bash-2.03$ cd lisp cd lisp + cd lisp bash-2.03$ gdb ../src/bootstrap-emacs gdb ../src/bootstrap-emacs + gdb ../src/bootstrap-emacs GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... (gdb) run -batch --no-site-file --multibyte -l autoload --eval '(setq generated-autoload-file "/home/ishikawa/PACKAGES/emacs-21.3/lisp/loaddefs.el")' -f batch-update-autoloads /home/ishikawa/PACKAGES/emacs-21.3/lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/net /home/ishikawa/PACKAGES/emacs-21.3/lisp/toolbar /home/ishikawa/PACKAGES/emacs-21.3/lisp/textmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/term /home/ishikawa/PACKAGES/emacs-21.3/lisp/progmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/play /home/ishikawa/PACKAGES/emacs-21.3/lisp/obsolete /home/ishikawa/PACKAGES/emacs-21.3/lisp/language /home/ishikawa/PACKAGES/emacs-21.3/lisp/international /home/ishikawa/PACKAGES/emacs-21.3/lisp/eshell /home/ishikawa/PACKAGES/emacs-21.3/lisp/emulation /home/ishikawa/PACKAGES/emacs-21.3/lisp/emacs-lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/calendar /home/ishikawa/PACKAGES/emacs-21.3/lisp/mail /home/ishikawa/PACKAGES/emacs-21.3/lisp/gnus Undefined command: "-batch". Try "help". (gdb) run -batch --no-site-file --multibyte -l autoload --eval '(setq generated-autoload-file "/home/ishikawa/PACKAGES/emacs-21.3/lisp/loaddefs.el")' -f batch-update-autoloads /home/ishikawa/PACKAGES/emacs-21.3/lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/net /home/ishikawa/PACKAGES/emacs-21.3/lisp/toolbar /home/ishikawa/PACKAGES/emacs-21.3/lisp/textmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/term /home/ishikawa/PACKAGES/emacs-21.3/lisp/progmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/play /home/ishikawa/PACKAGES/emacs-21.3/lisp/obsolete /home/ishikawa/PACKAGES/emacs-21.3/lisp/language /home/ishikawa/PACKAGES/emacs-21.3/lisp/international /home/ishikawa/PACKAGES/emacs-21.3/lisp/eshell /home/ishikawa/PACKAGES/emacs-21.3/lisp/emulation /home/ishikawa/PACKAGES/emacs-21.3/lisp/emacs-lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/calendar /home/ishikawa/PACKAGES/emacs-21.3/lisp/mail /home/ishikawa/PACKAGES/emacs-21.3/lisp/gnus Starting program: /home/ishikawa/PACKAGES/emacs-21.3/lisp/../src/bootstrap-emacs -batch --no-site-file --multibyte -l autoload --eval '(setq generated-autoload-file "/home/ishikawa/PACKAGES/emacs-21.3/lisp/loaddefs.el")' -f batch-update-autoloads /home/ishikawa/PACKAGES/emacs-21.3/lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/net /home/ishikawa/PACKAGES/emacs-21.3/lisp/toolbar /home/ishikawa/PACKAGES/emacs-21.3/lisp/textmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/term /home/ishikawa/PACKAGES/emacs-21.3/lisp/progmodes /home/ishikawa/PACKAGES/emacs-21.3/lisp/play /home/ishikawa/PACKAGES/emacs-21.3/lisp/obsolete /home/ishikawa/PACKAGES/emacs-21.3/lisp/language /home/ishikawa/PACKAGES/emacs-21.3/lisp/international /home/ishikawa/PACKAGES/emacs-21.3/lisp/eshell /home/ishikawa/PACKAGES/emacs-21.3/lisp/emulation /home/ishikawa/PACKAGES/emacs-21.3/lisp/emacs-lisp /home/ishikawa/PACKAGES/emacs-21.3/lisp/calendar /home/ishikawa/PACKAGES/emacs-21.3/lisp/mail /home/ishikawa/PACKAGES/emacs-21.3/lisp/gnus (No changes need to be saved) Program received signal SIGSEGV, Segmentation fault. 0x41de4 in __do_global_dtors_aux () (gdb) where #0 0x41de4 in __do_global_dtors_aux () #1 0x1891c4 in _fini () #2 0xfee9bca4 in _exithandle () from /usr/lib/libc.so.1 #3 0xfef1f87c in exit () from /usr/lib/libc.so.1 #4 0xd29c4 in Fkill_emacs (arg=275491892) at emacs.c:1830 #5 0x136cf8 in Feval (form=1345830528) at eval.c:2013 #6 0x1341bc in Fif (args=1345830520) at eval.c:365 #7 0x136ebc in Feval (form=1345830512) at eval.c:1960 #8 0x134378 in Fprogn (args=275491844) at eval.c:431 #9 0x138048 in funcall_lambda (fun=1345830560, nargs=0, arg_vector=0xffbedfc8) at eval.c:2844 #10 0x137ec4 in apply_lambda (fun=1345830560, args=0, eval_flag=1) at eval.c:2770 #11 0x136afc in Feval (form=1345810504) at eval.c:2071 #12 0x1355c4 in Funwind_protect (args=3) at eval.c:1125 #13 0x136ebc in Feval (form=1345810464) at eval.c:1960 #14 0x134378 in Fprogn (args=1345812324) at eval.c:431 #15 0x135040 in Flet (args=1345810456) at eval.c:875 #16 0x136ebc in Feval (form=1345810384) at eval.c:1960 #17 0x134378 in Fprogn (args=814935748) at eval.c:431 #18 0x1341bc in Fif (args=1345808848) at eval.c:365 #19 0x136ebc in Feval (form=1345808840) at eval.c:1960 #20 0x134378 in Fprogn (args=1345812340) at eval.c:431 #21 0x138048 in funcall_lambda (fun=1345812348, nargs=0, arg_vector=0xffbee7c8) at eval.c:2844 #22 0x137ec4 in apply_lambda (fun=1345812348, args=0, eval_flag=1) at eval.c:2770 #23 0x136afc in Feval (form=1350246308) at eval.c:2071 #24 0x1358e0 in internal_condition_case (bfun=0xd3c44 <top_level_2>, handlers=275609820, hfun=0xd3900 <cmd_error>) at eval.c:1267 #25 0xd3c94 in top_level_1 () at keyboard.c:1262 #26 0x13543c in internal_catch (tag=275566244, func=0xd3c5c <top_level_1>, arg=275491844) at eval.c:1030 #27 0xd3ba8 in command_loop () at keyboard.c:1223 #28 0xd364c in recursive_edit_1 () at keyboard.c:950 #29 0xd37b0 in Frecursive_edit () at keyboard.c:1006 #30 0xd2534 in main (argc=0, argv=0xffbeedcc, envp=0xffbeee38) at emacs.c:1547 (gdb) quit The program is running. Exit anyway? (y or n) y Again, the program seems to detect internal problem and calls kill-emacs (Fkill_emacs) itself and then somehow the exit processing sees segmentation error. All I can think of, from past experiences is the gcc now breaks unexelf.c (a code to dump program's image for subsequent loading: it has initialized application data frozen at the moment of dump and so the subsequent loading doesn't require initialization. Very architecture specific, and object format specific.), or miscompiles code somewhere, which I can't pinpoint yet. I posted a plea for help in gnu.help.emacs and hopefully, someone more knowledgeable than I am might be able to figure out where the compilation broke.
I have also hit the same bug with gcc-3.3 on solaris-2.8. Whenever emacs exits, it coredumps: perth 158> gdb src/emacs GNU gdb 5.3 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... (gdb) run -nw -q [exit emacs with C-x C-c] Program received signal SIGSEGV, Segmentation fault. 0x00042e0c in __do_global_dtors_aux () (gdb) where #0 0x00042e0c in __do_global_dtors_aux () #1 0x00190914 in _fini () #2 0xfec9bca4 in _exithandle () from /usr/lib/libc.so.1 #3 0xfed1f87c in exit () from /usr/lib/libc.so.1 #4 0x000d4ab4 in Fkill_emacs (arg=271248388) at emacs.c:1830 #5 0x00139df4 in Ffuncall (nargs=0, args=0xffbede78) at eval.c:2659 #6 0x001687e8 in Fbyte_code (bytestr=-4268280, vector=2286344, maxdepth=5) at bytecode.c:716 #7 0x0013a244 in funcall_lambda (fun=1076027988, nargs=1, arg_vector=0xffbee0dc) at eval.c:2851 #8 0x00139ce0 in Ffuncall (nargs=1, args=0xffbee0d8) at eval.c:2716 #9 0x00135014 in Fcall_interactively (function=271410140, record_flag=271248388, keys=1076633600) at callint.c:797 #10 0x000e2164 in Fcommand_execute (cmd=271410140, record_flag=271248388, keys=271248388, special=2774016) at keyboard.c:9250 #11 0x000d6320 in command_loop_1 () at keyboard.c:1661 #12 0x00137b10 in internal_condition_case (bfun=0xd5ea8 <command_loop_1>, handlers=271370460, hfun=0xd59f0 <cmd_error>) at eval.c:1267 #13 0x000d5d1c in command_loop_2 () at keyboard.c:1245 #14 0x0013766c in internal_catch (tag=271322788, func=0xd5cf8 <command_loop_2>, arg=271248388) at eval.c:1030 #15 0x000d5ca8 in command_loop () at keyboard.c:1224 #16 0x000d573c in recursive_edit_1 () at keyboard.c:950 #17 0x000d58a0 in Frecursive_edit () at keyboard.c:1006 #18 0x000d4624 in main (argc=0, argv=0xffbee834, envp=0xffbee844) at emacs.c:1547 (gdb)
emacs segfaulting at exit time after doing nothing!? I also found that during boostrapping emacs, bootstrap-emacs, which is created using bootstrap-emacs: bootstrap-temacs bootstrap-doc ./temacs --batch --load loadup bootstrap mv -f emacs bootstrap-emacs rm -f temacs which means that the executables, both emacs and bootstrap-emacs seem to be mangled AFTER unexec() in emacs-21.3/src/unexelf.c. (non-initialized Emacs does so many initial data loading (emacs lisp libraries getting loaded), that once the internal state is reached, the image of the process is dumped into an external file, which becomes the executable file that users will use. So comes the question. Has there been a change between GCC 3.2.3 and GCC 3.3 in the area of memory layout of the GCC startup code (or for that matter, a generic memory layout of the GCC produced code)? unexec(), which produces the clone of the running image, is highly dependent on the executable format and the implicit assumptions made by the startup code, etc. That we are seeing segmentation fault in __do_global_ators() highly suggests that unexec() is now broken somehow.
Subject: Re: GNU Emacs 21.3 failed to install using GCC 3.3, but GCC 3.2.3 works. > So comes the question. Has there been a change between > GCC 3.2.3 and GCC 3.3 > in the area of memory layout of the GCC startup code (or for that matter, > a generic memory layout of the GCC produced code)? There have been many changes all over the place in the compiler between the 3.2.x series and the 3.3.x series, which can potentially affect every single line of the asm ouput. So this makes no sense to try to pinpoint one. Try to identify the miscompilation by recompiling individual files with -O0. Then try to isolate the function in the file and recompile it with --save-temps to generate a preprocessed testcase.
Re-targeted because of lack of testcase.
Hi, Since there are about a hundred C source files under emacs-21.3/src/, it will be sometime before I can pinpoint the file which caused the problem. (Not all of them are used for solaris port, but it certainly takes time to check each file.) One other thing, I am more inclined to think that this is a subtle startup memory layout problem. I will dig deeper into crtstuff.c, and config/sparc/* changes between 3.2.3 and 3.3. For those unfamiliar what unexec() does, I would suggest the following URLs. Emacs building. http://www.gnu.org/manual/elisp/html_node/elisp_715.html Very early reference to the dump/unexec(). http://www.geocrawler.com/archives/3/357/1992/7/0/1995073/ XEmacs also uses unexec(). One XEmacs user bug report showing how a subtle change in the output binary format breaks dumping/unexec(). http://list-archive.xemacs.org/xemacs-patches/200005/msg00064.html Knuth's TeX used to use dump/unexec, but no longer uses it. This is because the modern hardware is deemed fast enough to read/load external data (macro packages) at each startup. I agree that dumping binary image using unexec() for subsequent faster startup is full of perils depending on the binary format peculiarities, and indeed it breaks when a new OS is introduced, etc.. But given the popularity of UltraSparc Solaris 8, I wish we can find a solution. As per suggestion, I will try to check if compiling some files -O0 solves the problem and in the meantime check the Emacs community to see if someone analyzes the problem and find the root cause of the problem. Thank you again.
Subject: Re: GNU Emacs 21.3 failed to install using GCC 3.3, but GCC 3.2.3 works. > But given the popularity of UltraSparc Solaris 8, I wish > we can find a solution. I wish too. > As per suggestion, I will try to check if compiling some files -O0 solves > the problem and in the meantime check the Emacs community to see > if someone analyzes the problem and find the root cause of the problem. Good idea. It is very unfortunate that GCC 3.3 has regressed that much on SPARC with regard to GCC 3.2.3, but this was somehow unavoidable given the shortage of testing. Hopefully the 3.3.x series will stabilize shortly on this platform too. Thanks for your help.
Subject: Re: GNU Emacs 21.3 failed to install using GCC 3.3, but GCC 3.2.3 works. On Fri, Jul 04, 2003 at 08:59:33PM -0000, ishikawa at yk dot rim dot or dot jp wrote: > One other thing, I am more inclined to think that > this is a subtle startup memory layout problem. > I will dig deeper into crtstuff.c, and config/sparc/* changes > between 3.2.3 and 3.3. Do you by any chance use GNU binutils or did you do so in the past? If so please try to upgrade to 2.13.1 and recompile shared libs that were built with an earlier Version of binutils. regards Christian
Per question regarding gnu binutils, the answer is "No. I have not used GNU binutils" on this ultrasparc blade150. I am using the pre-installed Sun's own as, ld and installed recommended patches from sunsolve.sun.com, that is all.
You asked someone to build individual files with "-g -O0" to see which file causes the problem to occur. I've built emacs-21.3 entirely with "-g -O0" (also "-g" & "-O0" individually just in case) and it still coredumps on my SunUltra. This is with gcc-3.3, Sun ld & as: Program received signal SIGSEGV, Segmentation fault. 0x00042e80 in __do_global_dtors_aux () (gdb) where #0 0x00042e80 in __do_global_dtors_aux () #1 0x002cb1e0 in _fini () #2 0xfec9bca4 in _exithandle () from /usr/lib/libc.so.1 #3 0xfed1f87c in exit () from /usr/lib/libc.so.1 #4 0x00157fe8 in Fkill_emacs (arg=272538628) at emacs.c:1830 #5 0x00220cf8 in Ffuncall (nargs=1, args=0xffbedc20) at eval.c:2659 #6 0x00281c04 in Fbyte_code (bytestr=808881428, vector=1077317008, maxdepth=5) at bytecode.c:716 #7 0x00221ae8 in funcall_lambda (fun=1077316836, nargs=1, arg_vector=0xffbededc) at eval.c:2851 #8 0x00220fc4 in Ffuncall (nargs=2, args=0xffbeded8) at eval.c:2707 #9 0x00219700 in Fcall_interactively (function=272704476, record_flag=272538628, keys=1077923840) at callint.c:797 #10 0x00171674 in Fcommand_execute (cmd=272704476, record_flag=272538628, keys=272538628, special=272538628) at keyboard.c:9250 #11 0x0015c204 in command_loop_1 () at keyboard.c:1661 #12 0x0021cbfc in internal_condition_case (bfun=0x15a2e8 <command_loop_1>, handlers=272660700, hfun=0x159ae4 <cmd_error>) at eval.c:1267 #13 0x0015a080 in command_loop_2 () at keyboard.c:1245 #14 0x0021c47c in internal_catch (tag=272613028, func=0x15a054 <command_loop_2>, arg=272538628) at eval.c:1030 #15 0x00159ff8 in command_loop () at keyboard.c:1224 #16 0x0015964c in recursive_edit_1 () at keyboard.c:950 #17 0x001598a8 in Frecursive_edit () at keyboard.c:1006 #18 0x0015750c in main (argc=3, argv=0xffbee7c4, envp=0xffbee7d4) at emacs.c:1547
Very annoying. Did you run GDB on the generated core file?
As Simon Marshall in message # 11 shows that "-g -O" results in the same segmentation error problem as I have observed. (I wrote a script to re-compile files one by one, but it would take a couple of days. His work saved me an extra work.) So I am more inclined to believe it is the startup code layout problem or something. I checked with gnu.emacs.help (I should have used gnu.emacs.bugs where developer are more like to hang around) and was suggested to use objdump to look inside the generated code. So my current plan is to look at using objdump the current files: temacs <- natively linked generated emacs executable. bootstrap-emacs <- This is an image created from temacs by dump/unexecin after reading some data (emacs lisp code.) I suspect that the unexec code in emacs-21.3/src/unexelf.c probably misbehaves due to the very subtle memory layout in temacs generated by GCC 3.3. Probably I should be able to find a difference between temacs (by GCC 3.2.3) objdump segment listing and temacs (by GCC 3.3) objdump segment listing. bootstrap-emacs (by GCC 3.2.3), and bootstrap-emacs (by GCC 3.3). If worst comes to worst, I am planning to insert a few write() calls inside crtstuff.c to print out the __DTOR___, etc inside the C startup/epilogue function in order to find out what is going on there. The function __do_global_dtor_aux() can't (shouldn't) fail if these global values are set up correctly, right? So something screws up the initialization, etc.. Failing inside __do_global_dtor_aux() is exactly what I have observed myself in the local gdb sessions. Suggestion welcome.
Subject: Re: GNU Emacs 21.3 failed to install using GCC 3.3, but GCC 3.2.3 works. > Probably I should be able to find a difference between > temacs (by GCC 3.2.3) objdump segment listing and > temacs (by GCC 3.3) objdump segment listing. > > bootstrap-emacs (by GCC 3.2.3), and > bootstrap-emacs (by GCC 3.3). There will be very likely many differences between the code generated by these two versions of the compiler. Why not simply run GDB on the core file? Is that not doable?
I ran it under gdb, but what should I be looking for? Emacs appears to be doing the right thing; when it completes normally it calls exit(). The crash happens in __do_global_dtors_aux() and I'm unable to debug in there.
Just to clarify: temacs doesn't crash, right? Which means the bug is definitely related to the unexec/dump code?
> Probably I should be able to find a difference between > temacs (by GCC 3.2.3) objdump segment listing and > temacs (by GCC 3.3) objdump segment listing. > > bootstrap-emacs (by GCC 3.2.3), and > bootstrap-emacs (by GCC 3.3). See if you can find a difference in the section order, segment order, symbol table, or some such, which might trigger lurking unexec bugs. Differences in the actual machine code or the length thereof are not going to be meaningful due to the number of changes in code generation. :-P Looking at unexelf.c, I really want this to be a bug in it, because it's an evil file. :-)
> Just to clarify: temacs doesn't crash, right? > Which means the bug is definitely related to the unexec/dump code? Correct, temacs does not crash. BTW, on emacs-devel@gnu.org, Dhruva Krishnamurthy [seagull@fastmail.fm] said: > I had sent an email to the list: > ---------------------------------------------------------------- > I had tried building GNU Emacs with GCC 3.3 and had encountered a > problem which I have raised as a bug in GCC bugzilla(#9816). > I tried again using a newer port of GCC 3.3 and GCC 3.4 (not an official > port though) from: > http://www.thisiscool.com/gcc33_mingw.htm > > I am unable to build or even progress to get an emacs.exe (executable). > With the earlier port of GCC 3.3, I could build a bare emacs executable > which failed in compiling elisp files. > If someone has a link to a GCC 3.3 port on W2K (Win32/MinGW), please let > me know. I will try to do some extensive testing. Unfortunately, I do not > have the older GCC 3.3 port with me now! > -------------------------------------------------------------------- > As the problem is happening on Windows 2000 (W2K) platform, I feel it is > not releted to > unexec and specific to ELF. On W2K, I am not sure it uses 'ld' for > linking. So, modifications to 'ld' may not be the cause.
Hi, finally, I obtained the objdump output of temacs and bootstrap-emacs. Here is a recap of Problems with gcc 3.3 On UltraSparc Solaris 8, gcc 3.3 creates a bootstrap-emacs (from temacs) that crashes upon exit. gcc 3.2.3 didn't have such problem. gcc 3.2.3 version: Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/specs Configured with: ../configure --disable-nls --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld Thread model: posix gcc version 3.2.3 gcc 3.3 version: Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs Configured with: /home/ishikawa/PACKAGES/gcc-3.3-cvs/gcc/configure --enable-languages=c --program-suffix=-3.3-branch Thread model: posix gcc version 3.3.1 20030703 (prerelease) This is the analysis by looking at the created binary's ELF headers using objdump -x (I compiled objdump by downloading binutils myself. But I didn't install binutils package as a whole. I am using Sun's as and ld.) Also, please bear in mind that temacs is the natively produced executable and then it subsequently deforms itself into another executable by callilng unexec() in emacs-21.3/src/unexelf.c. temacs -- through unexec() --> bootstrap-emacs bootstrap-emacs is created with the following shell script. Before bootstrap-emacs is created, a program "temacs" has been compiled and linked using GCC. cd emacs-21.3/src ./temacs --batch --load loadup bootstrap # the above creates a new executable by unexec()/dumping # called 'emacs' and then it is renamed bootstrap-emacs # mv -f emacs bootstrap-emacs mv temacs bootstrap-temacs-saved rm -f temacs Attached Listing [1] objdump -x output for temacs compiled by GCC 3.2.3 [2] objdump -x output for emacs with GCC 3.2.3 [3] objdump -x output for bootstrap-emacs with GCC 3.2.3 [4] diff of [1] and [2] [5] objdump -x output for temacs compiled by GCC 3.3-branch [6] objdump -x output for bootstrap-emacs with GCC 3.3-branch (Since bootstrap-emacs aborts, I could not produce emacs.) [7] diff of [2] and [5] [8] diff of [3] and [6] Observation: I immediately noticed that comparing the bootstrap-emacs objdump output ([8]) that bootstrap-emacs didn't have D_PAGE architecture flag disabled with GCC 3.3-branch whereas (unexeced/dumped) bootstrap-emacs under gcc 3.2.3 still has D_PAGE attribute. I wonder if this is the culprit. (But I have no idea how this came about.) ---- Quote of the relevant diff output: Quote from [8] *** /tmp/working-b.lst 2003 July 11 Fri --- /tmp/b.lst 2003 July 11 Fri *************** *** 1,9 **** ! src/bootstrap-emacs: file format elf32-sparc ! src/bootstrap-emacs ! architecture: sparc, flags 0x00000112: ! EXEC_P, HAS_SYMS, D_PAGED ! start address 0x00041d30 Program Header: PHDR off 0x00000034 vaddr 0x00010034 paddr 0x00000000 align 2**0 --- 1,9 ---- ! bootstrap-emacs: file format elf32-sparc ! bootstrap-emacs ! architecture: sparc, flags 0x00000012: ! EXEC_P, HAS_SYMS ! start address 0x00041d0c Program Header: PHDR off 0x00000034 vaddr 0x00010034 paddr 0x00000000 align 2**0 *************** *** 11,20 **** INTERP off 0x000000d4 vaddr 0x00000000 paddr 0x00000000 align 2**0 filesz 0x00000011 memsz 0x00000000 flags r-- LOAD off 0x00000000 vaddr 0x00010000 paddr 0x00000000 align 2**16 ! filesz 0x00193f70 memsz 0x00193f70 flags r-x ! LOAD off 0x00193f70 vaddr 0x001b3f70 paddr 0x00000000 align 2**16 ! filesz 0x00344090 memsz 0x00344090 flags rwx ! DYNAMIC off 0x00195264 vaddr 0x001b5264 paddr 0x00000000 align 2**0 filesz 0x00000128 memsz 0x00000000 flags rwx Dynamic Section: --- 11,20 ---- INTERP off 0x000000d4 vaddr 0x00000000 paddr 0x00000000 align 2**0 filesz 0x00000011 memsz 0x00000000 flags r-- LOAD off 0x00000000 vaddr 0x00010000 paddr 0x00000000 align 2**16 ! filesz 0x002c0a30 memsz 0x002c0a30 flags r-x ! LOAD off 0x002c0a30 vaddr 0x002e0a30 paddr 0x00000000 align 2**16 ! filesz 0x0079d5d0 memsz 0x0079d5d0 flags rwx ! DYNAMIC off 0x002c1d18 vaddr 0x002e1d18 paddr 0x00000000 align 2**0 filesz 0x00000128 memsz 0x00000000 flags rwx ---- end quote As a general observation, for dumped/unexeced images under both GCC versons, we get new "data" section between data.rel.local and .bss section. This was not in the original temacs. But this only reflects the nature of data undumping done by unexec(). I attach the whole file as zipped attachment for people's information. Again any insight and suggestion for further debugging steps welcome. Listing attached in the next posting.
Created attachment 4386 [details] listing for the previous post. (objdump -x output). README.ci.gz that contains the contents of the previous post AND the LISTING of objdump -x output.
Here's an email from an Emacs list. It looks interesting, and I guess everyone on this list working on this problem should know what is happening on the Emacs list. If you think I'm spamming this list, let me know. From: Paul Eggert [mailto:eggert@twinsun.com] Sent: 11 July 2003 10:55 To: rms@gnu.org Cc: simon.marshall@misys.com; emacs-devel@gnu.org; emacs-pretesters@gnu.org Subject: Re: Anyone built Emacs with gcc-3.3? > From: Richard Stallman <rms@gnu.org> > Date: Sat, 05 Jul 2003 18:25:56 -0400 > > The way to debug this is to treat it as an Emacs bug. When you > find the bug, it will actually be a miscompiled function (if this > is GCC's fault). Then you can send a useful GCC bug report. I looked into this a bit, and there seem to be at least two problems. One is an Emacs portability problem; the rest I don't know yet. Emacs assumes that a top-level declaration like "int pure[1000] = {0};" puts "pure" into the data area. However, this assumption is no longer true with GCC 3.3 on Solaris 8, which notices that "pure" has an initializer that is all zeros, and puts "pure" into BSS instead. This is a valid optimization, so I guess Emacs should deal with it. I checked for all static variables that Emacs 21.3 defines to be zero in Solaris 8, and which become readonly after dumping with GCC 3.2.3 but not with GCC 3.3, and I came up with the following patch to fix this portability problem. This fixes part of the bug, but not all; Emacs still dumps core. I'll see if I can investigate this further if I find more time but that won't be before this weekend.
I'm debugging it too. Here's what I've found so far: - Emacs 21.3 doesn't segfault on Solaris 2.6, - Emacs 21.3 does segfault on Solaris 7 and 9. Recompiling unexelf.c with GCC 3.2.3 doesn't change anything.
The miscompiled file is lwlib/lwlib-Xaw.c
The problem is the declaration: static Boolean actions_initted = False; According to Paul, Emacs makes assumptions about which section the variable will be put in and this doesn't work anymore with GCC 3.3. The following patch fixes the problem: 2003-07-11 Eric Botcazou <ebotcazou@libertysurf.fr> lwlib/lwlib-Xaw.c (actions_initted): Rename into actions_need_init and initialize to True. (make_dialog): Use actions_need_init instead of actions_initted. --- lwlib-Xaw.c.orig Fri Jul 11 19:40:07 2003 +++ lwlib-Xaw.c Fri Jul 11 19:41:04 2003 @@ -270,7 +270,7 @@ static void wm_delete_window(); static XtActionsRec xaw_actions [] = { {"lwlib_delete_dialog", wm_delete_window} }; -static Boolean actions_initted = False; +static Boolean actions_need_init = True; static Widget make_dialog (name, parent, pop_up_p, shell_title, icon_name, text_input_slot, radio_box, list, left_buttons, right_buttons) @@ -299,12 +299,12 @@ make_dialog (name, parent, pop_up_p, she if (radio_box) abort (); /* not implemented */ if (list) abort (); /* not implemented */ - if (! actions_initted) + if (actions_need_init) { XtAppContext app = XtWidgetToApplicationContext (parent); XtAppAddActions (app, xaw_actions, sizeof (xaw_actions) / sizeof (xaw_actions[0])); - actions_initted = True; + actions_need_init = False; } override = XtParseTranslationTable (overrideTrans);
I should have read the manual :-( It all boils down to the following remark: Emacs 21.3 must be compiled with -fno-zero-initialized-in-bss starting with GCC 3.3.
Too bad. If this is the case, then I think there is a documentation bug somewhere in that this command line option information is not easy to retrieve from the end user perspective. I coud not find it in my copy of document files in GCC 3.3 branch CVS. (Or is it in the latest GNU Emacs CVS? I have not checked with Emacs CVS since ordinary users will use released Emacs 21.3 and released GCC 3.3.) From the end user's perspective, this is what happens. GCC 3.2.3 and GNU Emacs 21.3 worked. GCC 3.3 and GNU Emacs 21.3 doesn't work under UltraSparc solaris 8. Many are likely to turn to GCC 3.3 documentation sources since GCC version is the variable here that seems to trigger the problem. So GCC 3.3 doc should contain a clear message on this problem IMHO. OK, this is what I found in gcc.info after doing find ... -type f -print | xargs egrep -i emacs find ... -type f -print | xargs egrep -i zero-initialized-in-bss >`-fno-zero-initialized-in-bss' > If the target supports a BSS section, GCC by default puts > variables that are initialized to zero into BSS. This can save > space in the resulting code. > > This option turns off this behavior because some programs > explicitly rely on variables going to the data section. E.g., so > that the resulting executable can find the beginning of that > section and/or make assumptions based on that. > > The default is `-fzero-initialized-in-bss'. Why don't we explicitly say "some programs, notably GNU Emacs 21.3 and prior versions, explicitly rely on ... " here? In the meantime, for those who failed to obtain the info on this new command line option (and it is not only me AFAIK), there were fixes on the Emacs side. Here is a summary of what I found out. I thought we should put this info to make this bugzilla entry complete. The problem was two-fold. Basically, GNU Emacs assumes certain memory layout assumptions which were broken due to the facts that GCC 3.3 put zero-initialized data into .bss segument rather than into .data segument by default. (This behavior seems to be changeable by the GCC command line option as Eric pointed out.) Also, a subtle change in the binary output also misled the undump() in unexelf.c. Both of these problems are handled by the Emacs source file patches which Paul Eggert posted to emacs-pretest mailing list. See the patches in the following posts. http://mail.gnu.org/archive/html/emacs-devel/2003-07/msg00207.html http://mail.gnu.org/archive/html/emacs-devel/2003-07/msg00219.html A few people including me reported successful compilation and installation of Emacs 21.3 under UltraSparc Solaris 8 using the patches to Emacs 21.3. (The problem with unexec() is very sensitive with memory layout of binary segments and it seems why Eric's patch to an Emacs file is ineffective to others and unexelf.c patch was necessary after all.) One of the posts generated a GCC warning and Paul mentioned that the following patch fixed the warning problem. (From http://mail.gnu.org/archive/html/emacs-devel/2003-07/msg00268.html >> Date: Mon, 14 Jul 2003 12:19:12 +0100 >> >> alloc.c:398: warning: initialization makes pointer from integer without >> a cast > >Yes, thanks: I fixed that by using the following instead in the >version that I checked into the Emacs trunk: > >Lisp_Object *staticvec[NSTATICS] = {&Vpurify_flag}; So if future Emacs 21.3 users (with GCC 3.3) don't realize the -fno-zero-initialized-in-bss option, Emacs is likely to be successfully installed with GCC 3.3. Hope this helps. I understand that maintaining a large complex software like GCC and GNU Emacs requires care and on-going effrots to improve and debug problems along the way. Also, technology transfer in the sense of educating users takes time. (And writing documentation takes time, too.) I would like to thank people who has made GCC accessible as free software as it is now. Happy Hacking,
Regardind the request to include a pointer into the gcc documentation that emacs needs a particular flag to compile: We've had similar requests in the past [1]. While in general I sympathize, the problem was that we didn't want to become the collecting point for which software needs which flag. I imagine there might be 20-50 projects out there of similar importance as emacs, and it doesn't seem appropriate if we have to document for all of them what works and what doesn't. In particular, since in this case emacs is using a non-standard-compliant feature if the C language, that clearly allowed the gcc optimization emacs was tripping over. It would certainly be a better choice if the emacs community either a) documented this clearly, or b) simply released a bug fix 21.3.1 for this problem. If that were the case, I consider it unlikely that people wouldn't use it if they can't get 21.3 to build. Among the problems I see with documenting emacs' problem is that that opens the door for similar requests in the future. And while in the case of emacs we know that the bug is already fixed, we don't always know that and would then accumulate old cruft in our documentation of which nobody has an idea whether it is still relevamt. W. [1] with -fstrict-aliasing, for example.