Snippet of shell session demonstrates how compiler generates bad asm (compare to result of compilation without -DBUG). Toolchain built using latest "crosstool" scripts. Sample code attached below. mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++ -c -o workqueue.o wq.C -O1 -DBUG mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump -d workqueue.o workqueue.o: file format elf64-powerpc Disassembly of section .text: 0000000000000000 <._Z10queue_workP16workqueue_structP11work_struct>: 0: 7c 08 02 a6 mflr r0 4: f8 01 00 10 std r0,16(r1) 8: f8 21 ff 91 stdu r1,-112(r1) c: 7d a9 6b 78 mr r9,r13 10: e9 29 00 00 ld r9,0(r9) 14: 88 09 00 03 lbz r0,3(r9) 18: 2f a0 00 00 cmpdi cr7,r0,0 1c: 40 be 00 10 bne+ cr7,2c <._Z10queue_workP16workqueue_structP11work_struct+0x2c> 20: 48 00 00 01 bl 20 <._Z10queue_workP16workqueue_structP11work_struct+0x20> 24: 60 00 00 00 nop 28: 48 00 00 00 b 28 <._Z10queue_workP16workqueue_structP11work_struct+0x28> 2c: 48 00 00 00 b 2c <._Z10queue_workP16workqueue_structP11work_struct+0x2c> 30: 00 00 00 00 .long 0x0 34: 00 09 00 01 .long 0x90001 38: 80 00 00 00 lwz r0,0(0) mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++ -v Reading specs from /opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/lib/gcc/powerpc64-linux/3.4.2/specs Configured with: /home/mostrows/tools/crosstool-0.28-rc37/build/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/gcc-3.4.2/configure --target=powerpc64-linux --host=powerpc64-host_unknown-linux-gnu --prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3 --disable-multilib --with-sysroot=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64-linux/sys-root --with-local-prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64-linux/sys-root --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit --enable-languages=c,c++ --enable-shared --enable-c99 --enable-long-long Thread model: posix gcc version 3.4.2 struct LinuxVPInfo_s { unsigned long physProc; }; typedef struct LinuxVPInfo_s LinuxVPInfo; extern LinuxVPInfo linuxVPInfo; struct cpu_workqueue_struct { int x; }; struct workqueue_struct { struct cpu_workqueue_struct cpu_wq[32]; }; struct work_struct { unsigned long pending; void (*func)(void *); void *data; void *wq_data; }; extern void __k42_spin_lock_irqsave(struct cpu_workqueue_struct* c, unsigned long *x); struct thread_info { int preempt_count; }; extern int test_and_set_bit(int x, unsigned long *y); #ifdef BUG static inline struct thread_info *current_thread_info(void) __attribute__((const)); static int smp_processor_id(void) __attribute__((pure)); #else static int smp_processor_id(void); static inline struct thread_info *current_thread_info(void); #endif static __inline__ int smp_processor_id(void) { return linuxVPInfo.physProc; } static inline struct thread_info *current_thread_info(void) { struct thread_info **ti; __asm__("mr %0,13" : "=r"(ti)); return *ti; } #define cti current_thread_info extern void lkBreakpoint(int line, const char* file, const char* fn); extern void __k42disable_preempt(); extern void __k42enable_preempt(); int queue_work(struct workqueue_struct *wq, struct work_struct *work) { unsigned long flags; int ret = 0; int cpu = ({ do { do { if (((cti()->preempt_count) & (((1UL << (8))-1) << 0))==0) { __k42disable_preempt(); (cti()->preempt_count)++; } else (cti()->preempt_count)++; } while (0); __asm__ __volatile__("": : :"memory"); } while (0); smp_processor_id(); }); struct cpu_workqueue_struct *cwq = wq->cpu_wq + cpu; if (!test_and_set_bit(0, &work->pending)) { do { if ((!work->wq_data)) { lkBreakpoint(103,"workqueue.C",(__func__)); } } while (0); work->wq_data = cwq; __k42_spin_lock_irqsave(cwq, &flags); ret = 1; } do { do { __asm__ __volatile__("": : :"memory"); do { if (((cti()->preempt_count) & (((1UL << (8))-1) << 0))==1) { (cti()->preempt_count)--; __k42enable_preempt(); } else (cti()->preempt_count)--; } while (0); } while (0); } while (0); return ret; }
What do you think is wrong, attach the difference. Also const/pure functions can changed so that they are only called once. And since they are pure (or is it const) is known not to access memory, the barrier: __asm__ __volatile__("": : :"memory"); does not make a difference.
Subject: Re: const/pure functions result in bad asm For starters, when compiling without -DBUG, the resulting assembly is much longer (correct assembly, with -DBUG below, that is without pure/const). Basically, most of the assembly code one would expect is missing. Further more, of the assembly output in my last mail, execution of that code leads to one of several infinite loops, all of which take the form of a branch instruction targeting itself. I expect that pure/const functions to not be called more than once, the problem is that the sample assembly in my original mail shows that the body of the function being compiler is pretty much all gone. On Wed, 2004-10-13 at 14:51 +0000, pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-13 14:51 ------- > What do you think is wrong, attach the difference. Also const/pure functions can changed so that they > are only called once. And since they are pure (or is it const) is known not to access memory, the > barrier: > __asm__ __volatile__("": : :"memory"); > > does not make a difference. > -- Michal Ostrowski <mostrows@watson.ibm.com> 0: 7c 08 02 a6 mflr r0 4: fb 81 ff e0 std r28,-32(r1) 8: fb a1 ff e8 std r29,-24(r1) c: fb c1 ff f0 std r30,-16(r1) 10: fb e1 ff f8 std r31,-8(r1) 14: f8 01 00 10 std r0,16(r1) 18: f8 21 ff 61 stdu r1,-160(r1) 1c: 7c 7d 1b 78 mr r29,r3 20: 7c 9e 23 78 mr r30,r4 24: 3b 80 00 00 li r28,0 28: 7d bf 6b 78 mr r31,r13 2c: e9 3f 00 00 ld r9,0(r31) 30: 88 09 00 03 lbz r0,3(r9) 34: 2f a0 00 00 cmpdi cr7,r0,0 38: 40 be 00 20 bne+ cr7,58 <._Z10queue_workP16workqueue_structP11work_struct+0x58> 3c: 48 00 00 01 bl 3c <._Z10queue_workP16workqueue_structP11work_struct+0x3c> 40: 60 00 00 00 nop 44: e9 7f 00 00 ld r11,0(r31) 48: 81 2b 00 00 lwz r9,0(r11) 4c: 39 29 00 01 addi r9,r9,1 50: 91 2b 00 00 stw r9,0(r11) 54: 48 00 00 18 b 6c <._Z10queue_workP16workqueue_structP11work_struct+0x6c> 58: 7d a9 6b 78 mr r9,r13 5c: e9 69 00 00 ld r11,0(r9) 60: 81 2b 00 00 lwz r9,0(r11) 64: 39 29 00 01 addi r9,r9,1 68: 91 2b 00 00 stw r9,0(r11) 6c: e9 22 00 00 ld r9,0(r2) 70: eb e9 00 06 lwa r31,4(r9) 74: 7b e0 17 64 rldicr r0,r31,2,61 78: 7f e0 ea 14 add r31,r0,r29 7c: 38 60 00 00 li r3,0 80: 7f c4 f3 78 mr r4,r30 84: 48 00 00 01 bl 84 <._Z10queue_workP16workqueue_structP11work_struct+0x84> 88: 60 00 00 00 nop 8c: 2f a3 00 00 cmpdi cr7,r3,0 90: 40 9e 00 3c bne- cr7,cc <._Z10queue_workP16workqueue_structP11work_struct+0xcc> 94: e8 1e 00 18 ld r0,24(r30) 98: 2f a0 00 00 cmpdi cr7,r0,0 9c: 40 be 00 18 bne+ cr7,b4 <._Z10queue_workP16workqueue_structP11work_struct+0xb4> a0: 38 60 00 67 li r3,103 a4: e8 82 00 08 ld r4,8(r2) a8: e8 a2 00 10 ld r5,16(r2) ac: 48 00 00 01 bl ac <._Z10queue_workP16workqueue_structP11work_struct+0xac> b0: 60 00 00 00 nop b4: fb fe 00 18 std r31,24(r30) b8: 7f e3 fb 78 mr r3,r31 bc: 38 81 00 70 addi r4,r1,112 c0: 48 00 00 01 bl c0 <._Z10queue_workP16workqueue_structP11work_struct+0xc0> c4: 60 00 00 00 nop c8: 3b 80 00 01 li r28,1 cc: 7d a9 6b 78 mr r9,r13 d0: e9 29 00 00 ld r9,0(r9) d4: 81 69 00 00 lwz r11,0(r9) d8: 79 60 06 20 clrldi r0,r11,56 dc: 2f a0 00 01 cmpdi cr7,r0,1 e0: 40 be 00 18 bne+ cr7,f8 <._Z10queue_workP16workqueue_structP11work_struct+0xf8> e4: 38 0b ff ff addi r0,r11,-1 e8: 90 09 00 00 stw r0,0(r9) ec: 48 00 00 01 bl ec <._Z10queue_workP16workqueue_structP11work_struct+0xec> f0: 60 00 00 00 nop f4: 48 00 00 18 b 10c <._Z10queue_workP16workqueue_structP11work_struct+0x10c> f8: 7d a9 6b 78 mr r9,r13 fc: e9 69 00 00 ld r11,0(r9) 100: 81 2b 00 00 lwz r9,0(r11) 104: 39 29 ff ff addi r9,r9,-1 108: 91 2b 00 00 stw r9,0(r11) 10c: 7f 83 e3 78 mr r3,r28 110: 38 21 00 a0 addi r1,r1,160 114: e8 01 00 10 ld r0,16(r1) 118: 7c 08 03 a6 mtlr r0 11c: eb 81 ff e0 ld r28,-32(r1) 120: eb a1 ff e8 ld r29,-24(r1) 124: eb c1 ff f0 ld r30,-16(r1) 128: eb e1 ff f8 ld r31,-8(r1) 12c: 4e 80 00 20 blr 130: 00 00 00 00 .long 0x0 134: 00 09 00 01 .long 0x90001 138: 80 04 00 00 lwz r0,0(r4)
-O1 -DBUG vs -O1 on "GCC: (GNU) 3.4.1 20040619 (prerelease)" is the same. Now with -fno-inline we do get different asm but the asm is correct as current_thread_info is marked as const so we only call it once. Also smp_processor_id is not called at all at -O1 or -fno-inline -O1 because smp_processor_id is pure since pure means it does not have any side effects.
I see the same behaviour as Michal is reporting with 3.4.3 20041010 gcc miscompiles a do while (0) "loop" into an infinite loop.
Subject: Re: const/pure functions result in bad asm Here is simpler code that demonstrates the problem. Note, no loops involved. Further below is compilation and objdump with and without DBUG. struct thread_info { int preempt_count; }; #ifdef BUG static inline struct thread_info *cti(void) __attribute__((pure)); #else static inline struct thread_info *cti(void); #endif static inline struct thread_info *cti(void) { struct thread_info **ti; __asm__("mr %0,13" : "=r"(ti)); return *ti; } int fn() { int ret = 0; cti()->preempt_count++; ret = cti()->preempt_count++; return ret; } mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++ - c -o workqueue.o wq.C -O1 mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump -d workqueue.o workqueue.o: file format elf64-powerpc Disassembly of section .text: 0000000000000000 <._Z2fnv>: 0: 7d aa 6b 78 mr r10,r13 4: e9 6a 00 00 ld r11,0(r10) 8: 81 2b 00 00 lwz r9,0(r11) c: 39 29 00 01 addi r9,r9,1 10: 91 2b 00 00 stw r9,0(r11) 14: e9 6a 00 00 ld r11,0(r10) 18: 81 2b 00 00 lwz r9,0(r11) 1c: 7d 23 07 b4 extsw r3,r9 20: 39 29 00 01 addi r9,r9,1 24: 91 2b 00 00 stw r9,0(r11) 28: 4e 80 00 20 blr 2c: 00 00 00 00 .long 0x0 30: 00 09 00 00 .long 0x90000 34: 00 00 00 00 .long 0x0 mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++ - c -o workqueue.o wq.C -O1 -DBUG mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump -d workqueue.o workqueue.o: file format elf64-powerpc Disassembly of section .text: 0000000000000000 <._Z2fnv>: 0: 48 00 00 00 b 0 <._Z2fnv> 4: 00 00 00 00 .long 0x0 8: 00 09 00 00 .long 0x90000 c: 00 00 00 00 .long 0x0
Hmm, I don't see it in 3.4.3 20041008 (prerelease) with a cross from powerpc-darwin to powerpc64- unknown-linux-gnu at -O0.
I should say the -O0 is for cc1.
Also with a newest compiler (todays) I cannot reproduce this. What compiler are you starting with and is this with a bootstrap or a cross compiler?
Subject: Re: [3.4 Regression] const/pure functions result in bad asm Well, it requires -O1, to address your earlier comment. I'm using the 3.4.2 tar archive: 2fada3a3effd2fd791df09df1f1534b3 /home/mostrows/downloads/gcc-3.4.2.tar.bz2 Here's the config info: mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-gcc -v Reading specs from /opt/crosstool/powerpc64-linux/gcc-3.4.2- glibc-2.3.3/lib/gcc/powerpc64-linux/3.4.2/specs Configured with: /home/mostrows/tools/crosstool-0.28- rc37/build/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/gcc-3.4.2/configure -- target=powerpc64-linux --host=powerpc64-host_unknown-linux-gnu -- prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3 --disable- multilib --with-sysroot=/opt/crosstool/powerpc64-linux/gcc-3.4.2- glibc-2.3.3/powerpc64-linux/sys-root --with-local- prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64- linux/sys-root --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit --enable-languages=c,c++ --enable-shared --enable- c99 --enable-long-long Thread model: posix gcc version 3.4.2 On Thu, 2004-10-14 at 00:03 +0000, pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-14 00:03 ------- > Also with a newest compiler (todays) I cannot reproduce this. What compiler are you starting with and is > this with a bootstrap or a cross compiler? >
I am using -O1 (the compiler which I was using though was compiled with -O0). Hmm, I still don't know what compiler you are still using to compile the cross compiler.
Subject: Re: [3.4 Regression] const/pure functions result in bad asm My cross-compiler was built with: Reading specs from /usr/lib/gcc-lib/powerpc-linux/3.3.4/specs Configured with: ../src/configure -v --enable-languages=c,c+ +,java,f77,pascal,objc,ada --prefix=/usr --mandir=/usr/share/man -- infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 -- enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java- gc=boehm --enable-java-awt=xlib --enable-objc-gc --disable-multilib powerpc-linux Thread model: posix gcc version 3.3.4 (Debian 1:3.3.4-12) On Thu, 2004-10-14 at 00:43 +0000, pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-14 00:43 ------- > I am using -O1 (the compiler which I was using though was compiled with -O0). Hmm, I still don't > know what compiler you are still using to compile the cross compiler. >
This must be a bug in the 3.3.x compiler as I compiled a cross at -O2 and still not able to reproduce the problem.
Crucial detail that Andrew may have overlooked: Only happens when compiling the testcase as c++.
Never mind you are right Alan, I do need to use cc1plus but why. The RTL is wrong already at .01.rtl. So this is 3.4 regression (it might be a 3.3 regression also I don't know). But this was fixed on the mainline. Why this does not effect any other target I don't know.
Michal's reduced testcase, in comment #5, fails on powerpc-linux too. expand_increment is expanding the inline function body twice, at line 9261 op0 = expand_expr (incremented, NULL_RTX, VOIDmode, 0); and at line 9353 temp = expand_assignment (incremented, newexp, ! post && ! ignore); Targets such as x86 don't do the second expansion, because even though "post" is zero due to expr.c line 8774 /* Faster to treat as pre-increment if result is not used. */ return expand_increment (exp, ! ignore, ignore); "single_insn" is true. That suggested a slight modification to Michal's testcase might fail on x86, and indeed it does. So it seems this bug will hit all targets. Hopefully the clue I've given here about multiple expansion of the inline function body will trigger an "Ah ha!" moment in someone who knows their way around tree-inline.c. $ gcc/xgcc -Bgcc/ -xc++ -O -DBUG -S /src/tmp/pr17972.c $ cat pr17972.s .file "pr17972.c" .text .align 2 .globl _Z2fnv .type _Z2fnv, @function _Z2fnv: .LFB3: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: .L2: jmp .L2 .LFE3: .size _Z2fnv, .-_Z2fnv .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.3 20041015 (prerelease)"
Created attachment 7363 [details] testcase
Postponed until GCC 3.4.4.
Alan, unless I'm mistaken, the testcase you filed doesn't fail on x86. Moreover, the assembly code you posted contains an infinite loop that can't reasonably come from the code.
I checked again with a current x86 gcc, 3.4.4 20041206, and the problem is still there. Note that this only happens with cc1plus. As to why we get the "impossible" assembly, what happens is that the inline function cti gets expanded twice, but the first return label is used for both expansions: cti body jump ret_label ret_label cti body jump ret_label A simpler testcase is: struct thread_info { short preempt_count; } x; static inline struct thread_info *cti (void) __attribute__ ((const)); static inline struct thread_info *cti (void) { return &x; } void fn (void) { ++cti()->preempt_count; }
> I checked again with a current x86 gcc, 3.4.4 20041206, and the problem is > still there. Note that this only happens with cc1plus. Sorry, you're right, I was able to reproduce on i586-redhat-linux-gnu and i586-mandrake-linux-gnu but I needed -mtune=i686. > As to why we get the "impossible" assembly, what happens is that the inline > function cti gets expanded twice, but the first return label is used for both > expansions: > > cti body > jump ret_label > ret_label > cti body > jump ret_label Thanks for the analysis. I'm going to investigate.
Investigating.
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-12/msg00722.html>.
Subject: Bug 17972 CVSROOT: /cvs/gcc Module name: gcc Changes by: ebotcazou@gcc.gnu.org 2004-12-15 19:14:55 Modified files: gcc : ChangeLog tree-inline.c gcc/testsuite : ChangeLog Added files: gcc/testsuite/g++.dg/opt: inline9.C Log message: PR c++/17972 * tree-inline.c (expand_call_inline): Set TREE_SIDE_EFFECTS on the STMT_EXPR wrapping up the inlined body. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6844&r2=2.6845 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-inline.c.diff?cvsroot=gcc&r1=1.157&r2=1.158 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4766&r2=1.4767 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/inline9.C.diff?cvsroot=gcc&r1=NONE&r2=1.1
Subject: Bug 17972 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: ebotcazou@gcc.gnu.org 2004-12-15 19:17:57 Modified files: gcc : ChangeLog tree-inline.c gcc/testsuite : ChangeLog Added files: gcc/testsuite/g++.dg/opt: inline9.C Log message: PR c++/17972 * tree-inline.c (expand_call_inline): Set TREE_SIDE_EFFECTS on the STMT_EXPR wrapping up the inlined body. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.737&r2=2.2326.2.738 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-inline.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.90.4.5&r2=1.90.4.6 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3389.2.326&r2=1.3389.2.327 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/inline9.C.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.1.2.1
Patch applied.
New test case exhibits same problem. struct demo { int s; }; extern struct demo * const *xd; static inline struct demo *fn1(void) __attribute__((pure)); static inline struct demo *fn1(void) { struct demo { int s; }; extern struct demo * const *xd; static inline struct demo *fn1(void) __attribute__((pure)); static inline struct demo *fn1(void) { struct demo **d; return *xd; } unsigned long foo() { unsigned long old = 0; old = (fn1()->s) & (((1UL << (8))-1) << 0); (fn1()->s) -= old; return old; } Here's how the generated code looks like, note the branch to self. mostrows@heater:/tmp$ /opt/crosstool/bin/powerpc64-linux-gcc -x c++ -O -c /tmp/bug.C -S -o - .file "bug.C" .section ".text" .align 2 .globl _Z3foov .section ".opd","aw" .align 3 _Z3foov: .quad ._Z3foov,.TOC.@tocbase,0 .previous .size _Z3foov,24 .type ._Z3foov,@function .globl ._Z3foov ._Z3foov: .LFB3: .L3: b .L3 .long 0 .byte 0,9,0,0,0,0,0,0 .LFE3: .size ._Z3foov,.-._Z3foov .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.4 20050211 (prerelease)" Specifying language as "C" instead of C++ results in correct code generation.
> New test case exhibits same problem. Confirmed. Reduced testcase: struct demo { int s; }; extern struct demo * const *xd; static inline struct demo *fn1(void) __attribute__((pure)); static inline struct demo *fn1(void) { struct demo **d; return *xd; } unsigned long foo() { unsigned long old = 0; fn1()->s -= 1; return old; } The code is correct with an explicit decrement operator.
Hum... again a consequence of the C++ front-end not setting TREE_SIDE_EFFECTS on every CALL_EXPR, like the C front-end. The sequence of events is as follows: 1. fn1()->s -= 1 is expanded to the 3-operand form in build_modify_expr { /* A binary op has been requested. Combine the old LHS value with the RHS producing the value we should actually store into the LHS. */ my_friendly_assert (!PROMOTES_TO_AGGR_TYPE (lhstype, REFERENCE_TYPE), 978652); lhs = stabilize_reference (lhs); newrhs = cp_build_binary_op (modifycode, lhs, rhs); if (newrhs == error_mark_node) { error (" in evaluation of `%Q(%#T, %#T)'", modifycode, TREE_TYPE (lhs), TREE_TYPE (rhs)); return error_mark_node; } /* Now it looks like a plain assignment. */ modifycode = NOP_EXPR; } Note the call to stabilize_reference. However, it doesn't stabilize anything here because the lhs is advertised as having no side-effects. 2. The tree-inliner inlines the call. Since the same tree is referenced twice in the expression, the inlined body is also referenced twice is the expression and, therefore, expanded twice to RTL. However labels are not expanded multiple times but reused, so the second block of RTL ends up referencing the first and all hell breaks loose. At this point my conclusion is that the only safe approach is that of the C front-end. Mark, do you recall having reviewed the initial patch (see comment #22)? What do you think? Thanks in advance.
Subject: Re: [3.4 Regression] const/pure functions result in bad asm ebotcazou at gcc dot gnu dot org wrote: > 2. The tree-inliner inlines the call. Since the same tree is referenced twice > in the expression, the inlined body is also referenced twice is the expression > and, therefore, expanded twice to RTL. However labels are not expanded multiple > times but reused, so the second block of RTL ends up referencing the first and > all hell breaks loose. It really seems like the C++ front end is doing the right thing, abstractly -- these functions don't have side-effects! So, either the inliner or stabilize reference seems like it needs fixing. Maybe the latter should recognize CALL_EPRs? If it's not possible to fix this, then what we should do is modify build3 to mark CALL_EXPRs as having side-effects. Right now, it makrs them based on the flags for the function, so this problem isn't really C++-specific; it probably effects all languages except those whose front ends set TREE_SIDE_EFFECTS additionally themselves. (Unless C++ is changing out the operands to the CALL_EXPR after its created.)
> It really seems like the C++ front end is doing the right thing, > abstractly -- these functions don't have side-effects! So, either the > inliner or stabilize reference seems like it needs fixing. Maybe the > latter should recognize CALL_EPRs? I agree that the C++ FE is theoritically right, but other FEs appear to already have realized that the implementation constraints trumpet the theory here: on mainline, the C, Java and Fortran 95 FEs set TREE_SIDE_EFFECTS on CALL_EXPRs. I think it is too late to fix the problem in the tree inliner because it would be tricky to detect there whether a CALL_EXPR is mentioned more than once. Special-casing CALL_EXPRs in stabilize_reference could indeed be the solution. We could even be clever and check whether tree inlining is enabled and, if so, whether it has already occurred. Does that sound plausible? > If it's not possible to fix this, then what we should do is modify > build3 to mark CALL_EXPRs as having side-effects. Right now, it makrs > them based on the flags for the function, so this problem isn't really > C++-specific; it probably effects all languages except those whose front > ends set TREE_SIDE_EFFECTS additionally themselves. (Unless C++ is > changing out the operands to the CALL_EXPR after its created.) Right.
Subject: Re: [3.4 Regression] const/pure functions result in bad asm ebotcazou at gcc dot gnu dot org wrote: > ------- Additional Comments From ebotcazou at gcc dot gnu dot org 2005-03-03 19:25 ------- > >>It really seems like the C++ front end is doing the right thing, >>abstractly -- these functions don't have side-effects! So, either the >>inliner or stabilize reference seems like it needs fixing. Maybe the >>latter should recognize CALL_EPRs? > > > I agree that the C++ FE is theoritically right, but other FEs appear to already > have realized that the implementation constraints trumpet the theory here: on > mainline, the C, Java and Fortran 95 FEs set TREE_SIDE_EFFECTS on CALL_EXPRs. Interesting. So, we have some checks in gcc/tree.c to try to avoid setting TREE_SIDE_EFFECTS, and then the FEs all set it anyhow. :-( > Special-casing CALL_EXPRs in stabilize_reference could indeed be the solution. > We could even be clever and check whether tree inlining is enabled and, if so, > whether it has already occurred. Does that sound plausible? I don't think I'd try to be that clever. We might want the stabilization to occur even in other cases. In looking at it more closely, it definitely looks like stabilize_reference should deal with CALL_EXPRs. (I guess I was thinking that the problem with the duplicate labels could be avoided in the inliner by generating fresh labels. But, maybe that's not going to work for other reasons.) >>If it's not possible to fix this, then what we should do is modify >>build3 to mark CALL_EXPRs as having side-effects. Right now, it makrs >>them based on the flags for the function, so this problem isn't really >>C++-specific; it probably effects all languages except those whose front >>ends set TREE_SIDE_EFFECTS additionally themselves. (Unless C++ is >>changing out the operands to the CALL_EXPR after its created.) > > Right. OK, that's the fallback.
> I don't think I'd try to be that clever. We might want the > stabilization to occur even in other cases. In looking at it more > closely, it definitely looks like stabilize_reference should deal with > CALL_EXPRs. But then what's the difference with setting TREE_SIDE_EFFECTS on CALL_EXPRs? Also for 3.4.x some FE don't use tree inlining at all so I don't think they should be penalized because of the C++ FE. > (I guess I was thinking that the problem with the duplicate labels could > be avoided in the inliner by generating fresh labels. But, maybe that's > not going to work for other reasons.) How would you do that exactly? In our present case, the inliner inlines exactly one CALL_EXPR, but it is referenced twice in another tree.
Subject: Re: [3.4 Regression] const/pure functions result in bad asm ebotcazou at gcc dot gnu dot org wrote: > ------- Additional Comments From ebotcazou at gcc dot gnu dot org 2005-03-03 21:12 ------- > >>I don't think I'd try to be that clever. We might want the >>stabilization to occur even in other cases. In looking at it more >>closely, it definitely looks like stabilize_reference should deal with >>CALL_EXPRs. > > > But then what's the difference with setting TREE_SIDE_EFFECTS on CALL_EXPRs? > Also for 3.4.x some FE don't use tree inlining at all so I don't think they > should be penalized because of the C++ FE. There are other places where TREE_SIDE_EFFECTS matters. (Like, "do we have to emit this expression at all, if its result is not used?") The counter to your argument is that I don't see why the C++ front end should be penalized because other front ends have a workaround for what seems to be a bug in the middle end. The stated purpose for stabilize_reference is to make it possible to use an expression more than once. If CALL_EXPRs can't be used more than once (which your research would seem to prove), then it seems like stabilize_reference should handle that. >>(I guess I was thinking that the problem with the duplicate labels could >>be avoided in the inliner by generating fresh labels. But, maybe that's >>not going to work for other reasons.) > > How would you do that exactly? In our present case, the inliner inlines exactly > one CALL_EXPR, but it is referenced twice in another tree. OK, I'm confused, then -- I think that either we have to fix this problem in stabilize_reference (which seems better to me) or always set TREE_SIDE_EFFECTS on CALL_EXPRs in build3.
> There are other places where TREE_SIDE_EFFECTS matters. (Like, "do we > have to emit this expression at all, if its result is not used?") OK. > The counter to your argument is that I don't see why the C++ front end > should be penalized because other front ends have a workaround for what > seems to be a bug in the middle end. Agreed. But I also think that the FEs that don't use tree inlining cannot even remotely be affected by the problem, so certainly shouldn't be penalized either. > The stated purpose for stabilize_reference is to make it possible to use > an expression more than once. If CALL_EXPRs can't be used more than > once (which your research would seem to prove), then it seems like > stabilize_reference should handle that. Sure, but I think CALL_EXPRs without TREE_SIDE_EFFECTS can be reused more than once, once tree inlining is done, if they happen not to have been inlined. At least we don't have counter-examples yet. > OK, I'm confused, then -- I think that either we have to fix this > problem in stabilize_reference (which seems better to me) or always set > TREE_SIDE_EFFECTS on CALL_EXPRs in build3. stabilize_reference is fine with me. We now only have to settle on the granularity of the change.
An interesting thing in the head comment of unsafe_for_reeval in 3.4.x: This assumes that CALL_EXPRs and TARGET_EXPRs are never replicated in an expression tree, so that it safe to unsave them and the surrounding context will be correct. This assumption is violated by the C++ FE. On the other hand, unsafe_for_reeval is gone in 4.0 so I'm not sure we really care at this point.
New patch at http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00864.html
(In reply to comment #36) > New patch at http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00864.html > Eric, do you still consider this problem important to be solved for 3.4.x? Do you have a new version of your proposed patch? -- Gaby
> Eric, do you still consider this problem important to be solved for 3.4.x? As the saying goes in French: "ne soyons pas plus royaliste que le roi". The C++ maintainers apparenty don't care much, then neither do I anymore. > Do you have a new version of your proposed patch? No, I've stopped working on this months ago.
Fixed in 4.0.0 and higher. Won't fix for 3.4.x