Bug 17972 - [3.4 Regression] const/pure functions result in bad asm
Summary: [3.4 Regression] const/pure functions result in bad asm
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.4.2
: P2 critical
Target Milestone: 4.0.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2004-10-13 12:47 UTC by Michal Ostrowski
Modified: 2005-11-21 10:41 UTC (History)
6 users (show)

See Also:
Host:
Target: powerpc*-linux, i686-linux
Build:
Known to work: 3.3.2 4.0.0
Known to fail: 3.4.0
Last reconfirmed: 2005-10-30 17:18:45


Attachments
testcase (210 bytes, text/plain)
2004-10-16 11:00 UTC, Alan Modra
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Ostrowski 2004-10-13 12:47:27 UTC
Snippet of shell session demonstrates how compiler generates bad asm (compare 
to result of compilation without -DBUG).   Toolchain built using latest
"crosstool" scripts.

Sample code attached below.  

mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++    -c   -o
workqueue.o wq.C -O1 -DBUG
mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump -d
workqueue.o

workqueue.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <._Z10queue_workP16workqueue_structP11work_struct>:
   0:   7c 08 02 a6     mflr    r0
   4:   f8 01 00 10     std     r0,16(r1)
   8:   f8 21 ff 91     stdu    r1,-112(r1)
   c:   7d a9 6b 78     mr      r9,r13
  10:   e9 29 00 00     ld      r9,0(r9)
  14:   88 09 00 03     lbz     r0,3(r9)
  18:   2f a0 00 00     cmpdi   cr7,r0,0
  1c:   40 be 00 10     bne+    cr7,2c
<._Z10queue_workP16workqueue_structP11work_struct+0x2c>
  20:   48 00 00 01     bl      20
<._Z10queue_workP16workqueue_structP11work_struct+0x20>
  24:   60 00 00 00     nop
  28:   48 00 00 00     b       28
<._Z10queue_workP16workqueue_structP11work_struct+0x28>
  2c:   48 00 00 00     b       2c
<._Z10queue_workP16workqueue_structP11work_struct+0x2c>
  30:   00 00 00 00     .long 0x0
  34:   00 09 00 01     .long 0x90001
  38:   80 00 00 00     lwz     r0,0(0)
mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++ -v
Reading specs from
/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/lib/gcc/powerpc64-linux/3.4.2/specs
Configured with:
/home/mostrows/tools/crosstool-0.28-rc37/build/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/gcc-3.4.2/configure
--target=powerpc64-linux --host=powerpc64-host_unknown-linux-gnu
--prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3 --disable-multilib
--with-sysroot=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64-linux/sys-root
--with-local-prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64-linux/sys-root
--disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit
--enable-languages=c,c++ --enable-shared --enable-c99 --enable-long-long
Thread model: posix
gcc version 3.4.2


struct LinuxVPInfo_s {
    unsigned long physProc;
};

typedef struct LinuxVPInfo_s LinuxVPInfo;

extern LinuxVPInfo linuxVPInfo;

struct cpu_workqueue_struct {
    int x;
    
};

struct workqueue_struct {
    struct cpu_workqueue_struct cpu_wq[32];
};

struct work_struct {
    unsigned long pending;
    void (*func)(void *);
    void *data;
    void *wq_data;
};

extern void __k42_spin_lock_irqsave(struct cpu_workqueue_struct* c, 
				    unsigned long *x);

struct thread_info {
    int preempt_count;
};

extern int test_and_set_bit(int x, unsigned long *y);

#ifdef BUG
static inline struct thread_info *current_thread_info(void) 
    __attribute__((const));
static int smp_processor_id(void) __attribute__((pure));
#else
static int smp_processor_id(void);
static inline struct thread_info *current_thread_info(void);
#endif


static __inline__ int smp_processor_id(void) {
    return linuxVPInfo.physProc;
}

static inline struct thread_info *current_thread_info(void)
{
 struct thread_info **ti;
 __asm__("mr %0,13" : "=r"(ti));
 return *ti;
}
#define cti current_thread_info

extern void lkBreakpoint(int line, const char* file, const char* fn);

extern void __k42disable_preempt();
extern void __k42enable_preempt();

int queue_work(struct workqueue_struct *wq, struct work_struct *work)
{
 unsigned long flags;
 int ret = 0;
 int cpu = ({ do { 
	     do { 
		 if (((cti()->preempt_count) & (((1UL << (8))-1) << 0))==0) {
		     __k42disable_preempt(); 
		     (cti()->preempt_count)++; 
		 } else 
		     (cti()->preempt_count)++; 
	     } while (0); 
	     __asm__ __volatile__("": : :"memory"); 
	 } while (0); 
	 smp_processor_id(); 
     });
 struct cpu_workqueue_struct *cwq = wq->cpu_wq + cpu;

 if (!test_and_set_bit(0, &work->pending)) {
     do { 
	 if ((!work->wq_data)) { 
	     lkBreakpoint(103,"workqueue.C",(__func__)); 
	 } 
     } while (0);
     work->wq_data = cwq;
     
     __k42_spin_lock_irqsave(cwq, &flags);
     ret = 1;
 }
 do { 
     do { 
	 __asm__ __volatile__("": : :"memory"); 
	 do { 
	     if (((cti()->preempt_count) & (((1UL << (8))-1) << 0))==1) { 
		 (cti()->preempt_count)--; 
		 __k42enable_preempt(); 
	     } else 
		 (cti()->preempt_count)--;
	 } while (0); 
     } while (0); 
 } while (0);
 return ret;
}
Comment 1 Andrew Pinski 2004-10-13 14:51:34 UTC
What do you think is wrong, attach the difference.  Also const/pure functions can changed so that they 
are only called once.  And since they are pure (or is it const) is known not to access memory, the 
barrier:
             __asm__ __volatile__("": : :"memory"); 

does not make a difference.
Comment 2 Michal Ostrowski 2004-10-13 21:45:43 UTC
Subject: Re:  const/pure functions result in
	bad asm

For starters, when compiling without -DBUG, the resulting assembly is
much longer (correct assembly, with -DBUG below, that is without
pure/const).

Basically, most of the assembly code one would expect is missing.
Further more, of the assembly output in my last mail, execution of that
code leads to one of several infinite loops, all of which take the form
of a branch instruction targeting itself.

I expect that pure/const functions to not be called more than once, the
problem is that the sample assembly in my original mail shows that the
body of the function being compiler is pretty much all gone.

On Wed, 2004-10-13 at 14:51 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-13 14:51 -------
> What do you think is wrong, attach the difference.  Also const/pure functions can changed so that they 
> are only called once.  And since they are pure (or is it const) is known not to access memory, the 
> barrier:
>              __asm__ __volatile__("": : :"memory"); 
> 
> does not make a difference.
> 
-- 
Michal Ostrowski <mostrows@watson.ibm.com>



   0:   7c 08 02 a6     mflr    r0
   4:   fb 81 ff e0     std     r28,-32(r1)
   8:   fb a1 ff e8     std     r29,-24(r1)
   c:   fb c1 ff f0     std     r30,-16(r1)
  10:   fb e1 ff f8     std     r31,-8(r1)
  14:   f8 01 00 10     std     r0,16(r1)
  18:   f8 21 ff 61     stdu    r1,-160(r1)
  1c:   7c 7d 1b 78     mr      r29,r3
  20:   7c 9e 23 78     mr      r30,r4
  24:   3b 80 00 00     li      r28,0
  28:   7d bf 6b 78     mr      r31,r13
  2c:   e9 3f 00 00     ld      r9,0(r31)
  30:   88 09 00 03     lbz     r0,3(r9)
  34:   2f a0 00 00     cmpdi   cr7,r0,0
  38:   40 be 00 20     bne+    cr7,58
<._Z10queue_workP16workqueue_structP11work_struct+0x58>
  3c:   48 00 00 01     bl      3c
<._Z10queue_workP16workqueue_structP11work_struct+0x3c>
  40:   60 00 00 00     nop
  44:   e9 7f 00 00     ld      r11,0(r31)
  48:   81 2b 00 00     lwz     r9,0(r11)
  4c:   39 29 00 01     addi    r9,r9,1
  50:   91 2b 00 00     stw     r9,0(r11)
  54:   48 00 00 18     b       6c
<._Z10queue_workP16workqueue_structP11work_struct+0x6c>
  58:   7d a9 6b 78     mr      r9,r13
  5c:   e9 69 00 00     ld      r11,0(r9)
  60:   81 2b 00 00     lwz     r9,0(r11)
  64:   39 29 00 01     addi    r9,r9,1
  68:   91 2b 00 00     stw     r9,0(r11)
  6c:   e9 22 00 00     ld      r9,0(r2)
  70:   eb e9 00 06     lwa     r31,4(r9)
  74:   7b e0 17 64     rldicr  r0,r31,2,61
  78:   7f e0 ea 14     add     r31,r0,r29
  7c:   38 60 00 00     li      r3,0
  80:   7f c4 f3 78     mr      r4,r30
  84:   48 00 00 01     bl      84
<._Z10queue_workP16workqueue_structP11work_struct+0x84>
  88:   60 00 00 00     nop
  8c:   2f a3 00 00     cmpdi   cr7,r3,0
  90:   40 9e 00 3c     bne-    cr7,cc
<._Z10queue_workP16workqueue_structP11work_struct+0xcc>
  94:   e8 1e 00 18     ld      r0,24(r30)
  98:   2f a0 00 00     cmpdi   cr7,r0,0
  9c:   40 be 00 18     bne+    cr7,b4
<._Z10queue_workP16workqueue_structP11work_struct+0xb4>
  a0:   38 60 00 67     li      r3,103
  a4:   e8 82 00 08     ld      r4,8(r2)
  a8:   e8 a2 00 10     ld      r5,16(r2)
  ac:   48 00 00 01     bl      ac
<._Z10queue_workP16workqueue_structP11work_struct+0xac>
  b0:   60 00 00 00     nop
  b4:   fb fe 00 18     std     r31,24(r30)
  b8:   7f e3 fb 78     mr      r3,r31
  bc:   38 81 00 70     addi    r4,r1,112
  c0:   48 00 00 01     bl      c0
<._Z10queue_workP16workqueue_structP11work_struct+0xc0>
  c4:   60 00 00 00     nop
  c8:   3b 80 00 01     li      r28,1
  cc:   7d a9 6b 78     mr      r9,r13
  d0:   e9 29 00 00     ld      r9,0(r9)
  d4:   81 69 00 00     lwz     r11,0(r9)
  d8:   79 60 06 20     clrldi  r0,r11,56
  dc:   2f a0 00 01     cmpdi   cr7,r0,1
  e0:   40 be 00 18     bne+    cr7,f8
<._Z10queue_workP16workqueue_structP11work_struct+0xf8>
  e4:   38 0b ff ff     addi    r0,r11,-1
  e8:   90 09 00 00     stw     r0,0(r9)
  ec:   48 00 00 01     bl      ec
<._Z10queue_workP16workqueue_structP11work_struct+0xec>
  f0:   60 00 00 00     nop
  f4:   48 00 00 18     b       10c
<._Z10queue_workP16workqueue_structP11work_struct+0x10c>
  f8:   7d a9 6b 78     mr      r9,r13
  fc:   e9 69 00 00     ld      r11,0(r9)
 100:   81 2b 00 00     lwz     r9,0(r11)
 104:   39 29 ff ff     addi    r9,r9,-1
 108:   91 2b 00 00     stw     r9,0(r11)
 10c:   7f 83 e3 78     mr      r3,r28
 110:   38 21 00 a0     addi    r1,r1,160
 114:   e8 01 00 10     ld      r0,16(r1)
 118:   7c 08 03 a6     mtlr    r0
 11c:   eb 81 ff e0     ld      r28,-32(r1)
 120:   eb a1 ff e8     ld      r29,-24(r1)
 124:   eb c1 ff f0     ld      r30,-16(r1)
 128:   eb e1 ff f8     ld      r31,-8(r1)
 12c:   4e 80 00 20     blr
 130:   00 00 00 00     .long 0x0
 134:   00 09 00 01     .long 0x90001
 138:   80 04 00 00     lwz     r0,0(r4)

Comment 3 Andrew Pinski 2004-10-13 22:02:29 UTC
-O1 -DBUG vs -O1 on "GCC: (GNU) 3.4.1 20040619 (prerelease)" is the same.  Now with -fno-inline we 
do get different asm but the asm is correct as current_thread_info is marked as const so we only call it 
once.

Also smp_processor_id is not called at all at -O1 or -fno-inline -O1 because smp_processor_id is pure 
since pure means it does not have any side effects.
Comment 4 Alan Modra 2004-10-13 23:04:13 UTC
I see the same behaviour as Michal is reporting with 3.4.3 20041010
gcc miscompiles a do while (0) "loop" into an infinite loop.
Comment 5 Michal Ostrowski 2004-10-13 23:11:42 UTC
Subject: Re:  const/pure functions result in
	bad asm

Here is simpler code that demonstrates the problem.  Note, no loops
involved.  Further below is compilation and objdump with and without
DBUG. 

struct thread_info {
    int preempt_count;
};

#ifdef BUG
static inline struct thread_info *cti(void) 
    __attribute__((pure));
#else
static inline struct thread_info *cti(void);
#endif

static inline struct thread_info *cti(void)
{
 struct thread_info **ti;
 __asm__("mr %0,13" : "=r"(ti));
 return *ti;
}
int fn()
{
 int ret = 0;

 cti()->preempt_count++;

 ret = cti()->preempt_count++;

 return ret;
}



mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++    -
c   -o workqueue.o wq.C -O1 
mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump
-d workqueue.o 

workqueue.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <._Z2fnv>:
   0:   7d aa 6b 78     mr      r10,r13
   4:   e9 6a 00 00     ld      r11,0(r10)
   8:   81 2b 00 00     lwz     r9,0(r11)
   c:   39 29 00 01     addi    r9,r9,1
  10:   91 2b 00 00     stw     r9,0(r11)
  14:   e9 6a 00 00     ld      r11,0(r10)
  18:   81 2b 00 00     lwz     r9,0(r11)
  1c:   7d 23 07 b4     extsw   r3,r9
  20:   39 29 00 01     addi    r9,r9,1
  24:   91 2b 00 00     stw     r9,0(r11)
  28:   4e 80 00 20     blr
  2c:   00 00 00 00     .long 0x0
  30:   00 09 00 00     .long 0x90000
  34:   00 00 00 00     .long 0x0
mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-g++    -
c   -o workqueue.o wq.C -O1 -DBUG
mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-objdump
-d workqueue.o 

workqueue.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <._Z2fnv>:
   0:   48 00 00 00     b       0 <._Z2fnv>
   4:   00 00 00 00     .long 0x0
   8:   00 09 00 00     .long 0x90000
   c:   00 00 00 00     .long 0x0



Comment 6 Andrew Pinski 2004-10-13 23:36:37 UTC
Hmm, I don't see it in 3.4.3 20041008 (prerelease) with a cross from powerpc-darwin to powerpc64-
unknown-linux-gnu at -O0.
Comment 7 Andrew Pinski 2004-10-13 23:37:02 UTC
I should say the -O0 is for cc1.
Comment 8 Andrew Pinski 2004-10-14 00:03:06 UTC
Also with a newest compiler (todays) I cannot reproduce this. What compiler are you starting with and is 
this with a bootstrap or a cross compiler?
Comment 9 Michal Ostrowski 2004-10-14 00:34:56 UTC
Subject: Re:  [3.4 Regression] const/pure
	functions result in bad asm

Well, it requires -O1, to address your earlier comment.

I'm using the 3.4.2 tar archive:

2fada3a3effd2fd791df09df1f1534b3  /home/mostrows/downloads/gcc-3.4.2.tar.bz2

Here's the config info:

mostrows@heater:~/tools.k42/powerpc/partDeb/os$ powerpc64-linux-gcc -v
Reading specs from /opt/crosstool/powerpc64-linux/gcc-3.4.2-
glibc-2.3.3/lib/gcc/powerpc64-linux/3.4.2/specs
Configured with: /home/mostrows/tools/crosstool-0.28-
rc37/build/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/gcc-3.4.2/configure --
target=powerpc64-linux --host=powerpc64-host_unknown-linux-gnu --
prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3 --disable-
multilib --with-sysroot=/opt/crosstool/powerpc64-linux/gcc-3.4.2-
glibc-2.3.3/powerpc64-linux/sys-root --with-local-
prefix=/opt/crosstool/powerpc64-linux/gcc-3.4.2-glibc-2.3.3/powerpc64-
linux/sys-root --disable-nls --enable-threads=posix --enable-symvers=gnu
--enable-__cxa_atexit --enable-languages=c,c++ --enable-shared --enable-
c99 --enable-long-long
Thread model: posix
gcc version 3.4.2



On Thu, 2004-10-14 at 00:03 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-14 00:03 -------
> Also with a newest compiler (todays) I cannot reproduce this. What compiler are you starting with and is 
> this with a bootstrap or a cross compiler?
> 
Comment 10 Andrew Pinski 2004-10-14 00:43:32 UTC
I am using -O1 (the compiler which I was using though was compiled with -O0).  Hmm, I still don't 
know what compiler you are still using to compile the cross compiler.
Comment 11 Michal Ostrowski 2004-10-14 00:59:30 UTC
Subject: Re:  [3.4 Regression] const/pure
	functions result in bad asm

My cross-compiler was built with:

Reading specs from /usr/lib/gcc-lib/powerpc-linux/3.3.4/specs
Configured with: ../src/configure -v --enable-languages=c,c+
+,java,f77,pascal,objc,ada --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --
enable-shared --with-system-zlib --enable-nls --without-included-gettext
--enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-
gc=boehm --enable-java-awt=xlib --enable-objc-gc --disable-multilib
powerpc-linux
Thread model: posix
gcc version 3.3.4 (Debian 1:3.3.4-12)



On Thu, 2004-10-14 at 00:43 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-14 00:43 -------
> I am using -O1 (the compiler which I was using though was compiled with -O0).  Hmm, I still don't 
> know what compiler you are still using to compile the cross compiler.
> 
Comment 12 Andrew Pinski 2004-10-14 01:44:47 UTC
This must be a bug in the 3.3.x compiler as I compiled a cross at -O2 and still not able to reproduce 
the problem.
Comment 13 Alan Modra 2004-10-15 06:47:43 UTC
Crucial detail that Andrew may have overlooked: Only happens when compiling the
testcase as c++.
Comment 14 Andrew Pinski 2004-10-15 23:12:16 UTC
Never mind you are right Alan, I do need to use cc1plus but why.  The RTL is wrong already at .01.rtl.
So this is 3.4 regression (it might be a 3.3 regression also I don't know).  But this was fixed on the 
mainline.  Why this does not effect any other target I don't know.
Comment 15 Alan Modra 2004-10-16 10:58:44 UTC
Michal's reduced testcase, in comment #5, fails on powerpc-linux too.
expand_increment is expanding the inline function body twice, at line 9261
  op0 = expand_expr (incremented, NULL_RTX, VOIDmode, 0);
and at line 9353
  temp = expand_assignment (incremented, newexp, ! post && ! ignore);
Targets such as x86 don't do the second expansion, because even though "post" is
zero due to expr.c line 8774
      /* Faster to treat as pre-increment if result is not used.  */
      return expand_increment (exp, ! ignore, ignore);
"single_insn" is true.  That suggested a slight modification to Michal's
testcase might fail on x86, and indeed it does.

So it seems this bug will hit all targets.  Hopefully the clue I've given here
about multiple expansion of the inline function body will trigger an "Ah ha!"
moment in someone who knows their way around tree-inline.c.

$ gcc/xgcc -Bgcc/ -xc++ -O -DBUG -S /src/tmp/pr17972.c
$ cat pr17972.s
        .file   "pr17972.c"
        .text
        .align 2
.globl _Z2fnv
        .type   _Z2fnv, @function
_Z2fnv:
.LFB3:
        pushl   %ebp
.LCFI0:
        movl    %esp, %ebp
.LCFI1:
.L2:
        jmp     .L2
.LFE3:
        .size   _Z2fnv, .-_Z2fnv
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.3 20041015 (prerelease)"
Comment 16 Alan Modra 2004-10-16 11:00:06 UTC
Created attachment 7363 [details]
testcase
Comment 17 Mark Mitchell 2004-11-01 00:45:47 UTC
Postponed until GCC 3.4.4.
Comment 18 Eric Botcazou 2004-12-06 17:15:47 UTC
Alan, unless I'm mistaken, the testcase you filed doesn't fail on x86. 
Moreover, the assembly code you posted contains an infinite loop that can't
reasonably come from the code.
Comment 19 Alan Modra 2004-12-06 23:31:55 UTC
I checked again with a current x86 gcc, 3.4.4 20041206, and the problem is still
there.  Note that this only happens with cc1plus.  As to why we get the
"impossible" assembly, what happens is that the inline function cti gets
expanded twice, but the first return label is used for both expansions:

 cti body
 jump ret_label
ret_label
 cti body
 jump ret_label

A simpler testcase is:

struct thread_info
{
  short preempt_count;
} x;

static inline struct thread_info *cti (void) __attribute__ ((const));
static inline struct thread_info *cti (void)
{
  return &x;
}

void fn (void)
{
  ++cti()->preempt_count;
}
Comment 20 Eric Botcazou 2004-12-07 08:01:08 UTC
> I checked again with a current x86 gcc, 3.4.4 20041206, and the problem is
> still there.  Note that this only happens with cc1plus.

Sorry, you're right, I was able to reproduce on i586-redhat-linux-gnu and
i586-mandrake-linux-gnu but I needed -mtune=i686.

> As to why we get the "impossible" assembly, what happens is that the inline
> function cti gets expanded twice, but the first return label is used for both
> expansions:
> 
>  cti body
>  jump ret_label
> ret_label
>  cti body
>  jump ret_label

Thanks for the analysis.  I'm going to investigate.
Comment 21 Eric Botcazou 2004-12-07 08:30:49 UTC
Investigating.
Comment 22 Andrew Pinski 2004-12-10 03:28:58 UTC
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-12/msg00722.html>.
Comment 23 GCC Commits 2004-12-15 19:15:03 UTC
Subject: Bug 17972

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	ebotcazou@gcc.gnu.org	2004-12-15 19:14:55

Modified files:
	gcc            : ChangeLog tree-inline.c 
	gcc/testsuite  : ChangeLog 
Added files:
	gcc/testsuite/g++.dg/opt: inline9.C 

Log message:
	PR c++/17972
	* tree-inline.c (expand_call_inline): Set TREE_SIDE_EFFECTS
	on the STMT_EXPR wrapping up the inlined body.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6844&r2=2.6845
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-inline.c.diff?cvsroot=gcc&r1=1.157&r2=1.158
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4766&r2=1.4767
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/opt/inline9.C.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 25 Eric Botcazou 2004-12-15 19:19:36 UTC
Patch applied.
Comment 26 Michal Ostrowski 2005-02-24 14:18:46 UTC
New test case exhibits same problem.

struct demo {
    int s;
};

extern struct demo * const *xd;

static inline struct demo *fn1(void) __attribute__((pure));
static inline struct demo *fn1(void)
{
struct demo {
    int s;
};

extern struct demo * const *xd;

static inline struct demo *fn1(void) __attribute__((pure));
static inline struct demo *fn1(void)
{
    struct demo **d;
    return *xd;
}


unsigned long foo() {
    unsigned long old = 0;
    old = (fn1()->s) & (((1UL << (8))-1) << 0);
    (fn1()->s) -= old;
    return old;
}

Here's how the generated code looks like, note the branch to self.

mostrows@heater:/tmp$ /opt/crosstool/bin/powerpc64-linux-gcc -x c++ -O -c 
/tmp/bug.C -S -o -
        .file   "bug.C"
        .section        ".text"
        .align 2
        .globl _Z3foov
        .section        ".opd","aw"
        .align 3
_Z3foov:
        .quad   ._Z3foov,.TOC.@tocbase,0
        .previous
        .size   _Z3foov,24
        .type   ._Z3foov,@function
        .globl  ._Z3foov
._Z3foov:
.LFB3:
.L3:
        b .L3
        .long 0
        .byte 0,9,0,0,0,0,0,0
.LFE3:
        .size   ._Z3foov,.-._Z3foov
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.4 20050211 (prerelease)"

Specifying language as "C" instead of C++ results in correct code generation.
Comment 27 Eric Botcazou 2005-02-24 15:59:55 UTC
> New test case exhibits same problem.

Confirmed.  Reduced testcase:

struct demo {
    int s;
};

extern struct demo * const *xd;

static inline struct demo *fn1(void) __attribute__((pure));
static inline struct demo *fn1(void)
{
    struct demo **d;
    return *xd;
}

unsigned long foo()
{
    unsigned long old = 0;
    fn1()->s -= 1;
    return old;
}

The code is correct with an explicit decrement operator.
Comment 28 Eric Botcazou 2005-02-24 16:49:48 UTC
Hum... again a consequence of the C++ front-end not setting TREE_SIDE_EFFECTS on
every CALL_EXPR, like the C front-end.  The sequence of events is as follows:

1. fn1()->s -= 1 is expanded to the 3-operand form in build_modify_expr

	{
	  /* A binary op has been requested.  Combine the old LHS
     	     value with the RHS producing the value we should actually
     	     store into the LHS.  */

	  my_friendly_assert (!PROMOTES_TO_AGGR_TYPE (lhstype, REFERENCE_TYPE),
			      978652);
	  lhs = stabilize_reference (lhs);
	  newrhs = cp_build_binary_op (modifycode, lhs, rhs);
	  if (newrhs == error_mark_node)
	    {
	      error ("  in evaluation of `%Q(%#T, %#T)'", modifycode,
		     TREE_TYPE (lhs), TREE_TYPE (rhs));
	      return error_mark_node;
	    }
	  
	  /* Now it looks like a plain assignment.  */
	  modifycode = NOP_EXPR;
	}

Note the call to stabilize_reference.  However, it doesn't stabilize anything
here because the lhs is advertised as having no side-effects.

2. The tree-inliner inlines the call.  Since the same tree is referenced twice
in the expression, the inlined body is also referenced twice is the expression
and, therefore, expanded twice to RTL.  However labels are not expanded multiple
times but reused, so the second block of RTL ends up referencing the first and
all hell breaks loose.

At this point my conclusion is that the only safe approach is that of the C
front-end.

Mark, do you recall having reviewed the initial patch (see comment #22)?  What
do you think?  Thanks in advance.
Comment 29 Mark Mitchell 2005-03-03 18:47:03 UTC
Subject: Re:  [3.4 Regression] const/pure functions result
 in bad asm

ebotcazou at gcc dot gnu dot org wrote:

> 2. The tree-inliner inlines the call.  Since the same tree is referenced twice
> in the expression, the inlined body is also referenced twice is the expression
> and, therefore, expanded twice to RTL.  However labels are not expanded multiple
> times but reused, so the second block of RTL ends up referencing the first and
> all hell breaks loose.

It really seems like the C++ front end is doing the right thing, 
abstractly -- these functions don't have side-effects!  So, either the 
inliner or stabilize reference seems like it needs fixing.  Maybe the 
latter should recognize CALL_EPRs?

If it's not possible to fix this, then what we should do is modify 
build3 to mark CALL_EXPRs as having side-effects.  Right now, it makrs 
them based on the flags for the function, so this problem isn't really 
C++-specific; it probably effects all languages except those whose front 
ends set TREE_SIDE_EFFECTS additionally themselves.  (Unless C++ is 
changing out the operands to the CALL_EXPR after its created.)

Comment 30 Eric Botcazou 2005-03-03 19:25:43 UTC
> It really seems like the C++ front end is doing the right thing, 
> abstractly -- these functions don't have side-effects!  So, either the 
> inliner or stabilize reference seems like it needs fixing.  Maybe the 
> latter should recognize CALL_EPRs?

I agree that the C++ FE is theoritically right, but other FEs appear to already
have realized that the implementation constraints trumpet the theory here: on
mainline, the C, Java and Fortran 95 FEs set TREE_SIDE_EFFECTS on CALL_EXPRs.

I think it is too late to fix the problem in the tree inliner because it would
be tricky to detect there whether a CALL_EXPR is mentioned more than once.

Special-casing CALL_EXPRs in stabilize_reference could indeed be the solution. 
We could even be clever and check whether tree inlining is enabled and, if so,
whether it has already occurred.  Does that sound plausible?

> If it's not possible to fix this, then what we should do is modify 
> build3 to mark CALL_EXPRs as having side-effects.  Right now, it makrs 
> them based on the flags for the function, so this problem isn't really 
> C++-specific; it probably effects all languages except those whose front 
> ends set TREE_SIDE_EFFECTS additionally themselves.  (Unless C++ is 
> changing out the operands to the CALL_EXPR after its created.)

Right.
Comment 31 Mark Mitchell 2005-03-03 19:34:43 UTC
Subject: Re:  [3.4 Regression] const/pure functions result
 in bad asm

ebotcazou at gcc dot gnu dot org wrote:
> ------- Additional Comments From ebotcazou at gcc dot gnu dot org  2005-03-03 19:25 -------
> 
>>It really seems like the C++ front end is doing the right thing, 
>>abstractly -- these functions don't have side-effects!  So, either the 
>>inliner or stabilize reference seems like it needs fixing.  Maybe the 
>>latter should recognize CALL_EPRs?
> 
> 
> I agree that the C++ FE is theoritically right, but other FEs appear to already
> have realized that the implementation constraints trumpet the theory here: on
> mainline, the C, Java and Fortran 95 FEs set TREE_SIDE_EFFECTS on CALL_EXPRs.

Interesting.  So, we have some checks in gcc/tree.c to try to avoid 
setting TREE_SIDE_EFFECTS, and then the FEs all set it anyhow. :-(

> Special-casing CALL_EXPRs in stabilize_reference could indeed be the solution. 
> We could even be clever and check whether tree inlining is enabled and, if so,
> whether it has already occurred.  Does that sound plausible?

I don't think I'd try to be that clever.  We might want the 
stabilization to occur even in other cases.  In looking at it more 
closely, it definitely looks like stabilize_reference should deal with 
CALL_EXPRs.

(I guess I was thinking that the problem with the duplicate labels could 
be avoided in the inliner by generating fresh labels.  But, maybe that's 
not going to work for other reasons.)

>>If it's not possible to fix this, then what we should do is modify 
>>build3 to mark CALL_EXPRs as having side-effects.  Right now, it makrs 
>>them based on the flags for the function, so this problem isn't really 
>>C++-specific; it probably effects all languages except those whose front 
>>ends set TREE_SIDE_EFFECTS additionally themselves.  (Unless C++ is 
>>changing out the operands to the CALL_EXPR after its created.)
> 
> Right.

OK, that's the fallback.

Comment 32 Eric Botcazou 2005-03-03 21:12:26 UTC
> I don't think I'd try to be that clever.  We might want the 
> stabilization to occur even in other cases.  In looking at it more 
> closely, it definitely looks like stabilize_reference should deal with 
> CALL_EXPRs.

But then what's the difference with setting TREE_SIDE_EFFECTS on CALL_EXPRs?
Also for 3.4.x some FE don't use tree inlining at all so I don't think they
should be penalized because of the C++ FE.

> (I guess I was thinking that the problem with the duplicate labels could 
> be avoided in the inliner by generating fresh labels.  But, maybe that's 
> not going to work for other reasons.)

How would you do that exactly?  In our present case, the inliner inlines exactly
one CALL_EXPR, but it is referenced twice in another tree.
Comment 33 Mark Mitchell 2005-03-03 21:19:45 UTC
Subject: Re:  [3.4 Regression] const/pure functions result
 in bad asm

ebotcazou at gcc dot gnu dot org wrote:
> ------- Additional Comments From ebotcazou at gcc dot gnu dot org  2005-03-03 21:12 -------
> 
>>I don't think I'd try to be that clever.  We might want the 
>>stabilization to occur even in other cases.  In looking at it more 
>>closely, it definitely looks like stabilize_reference should deal with 
>>CALL_EXPRs.
> 
> 
> But then what's the difference with setting TREE_SIDE_EFFECTS on CALL_EXPRs?
> Also for 3.4.x some FE don't use tree inlining at all so I don't think they
> should be penalized because of the C++ FE.

There are other places where TREE_SIDE_EFFECTS matters.  (Like, "do we 
have to emit this expression at all, if its result is not used?")

The counter to your argument is that I don't see why the C++ front end 
should be penalized because other front ends have a workaround for what 
seems to be a bug in the middle end.

The stated purpose for stabilize_reference is to make it possible to use 
an expression more than once.  If CALL_EXPRs can't be used more than 
once (which your research would seem to prove), then it seems like 
stabilize_reference should handle that.

>>(I guess I was thinking that the problem with the duplicate labels could 
>>be avoided in the inliner by generating fresh labels.  But, maybe that's 
>>not going to work for other reasons.) 
> 
> How would you do that exactly?  In our present case, the inliner inlines exactly
> one CALL_EXPR, but it is referenced twice in another tree.

OK, I'm confused, then -- I think that either we have to fix this 
problem in stabilize_reference (which seems better to me) or always set 
TREE_SIDE_EFFECTS on CALL_EXPRs in build3.

Comment 34 Eric Botcazou 2005-03-03 22:02:43 UTC
> There are other places where TREE_SIDE_EFFECTS matters.  (Like, "do we 
> have to emit this expression at all, if its result is not used?")

OK.

> The counter to your argument is that I don't see why the C++ front end 
> should be penalized because other front ends have a workaround for what 
> seems to be a bug in the middle end.

Agreed.  But I also think that the FEs that don't use tree inlining cannot even
remotely be affected by the problem, so certainly shouldn't be penalized either.

> The stated purpose for stabilize_reference is to make it possible to use 
> an expression more than once.  If CALL_EXPRs can't be used more than 
> once (which your research would seem to prove), then it seems like 
> stabilize_reference should handle that.

Sure, but I think CALL_EXPRs without TREE_SIDE_EFFECTS can be reused more than
once, once tree inlining is done, if they happen not to have been inlined.  At
least we don't have counter-examples yet.

> OK, I'm confused, then -- I think that either we have to fix this 
> problem in stabilize_reference (which seems better to me) or always set 
> TREE_SIDE_EFFECTS on CALL_EXPRs in build3.

stabilize_reference is fine with me.  We now only have to settle on the
granularity of the change.
Comment 35 Eric Botcazou 2005-03-07 09:54:29 UTC
An interesting thing in the head comment of unsafe_for_reeval in 3.4.x:

   This assumes that CALL_EXPRs and TARGET_EXPRs are never replicated in
   an expression tree, so that it safe to unsave them and the surrounding
   context will be correct.

This assumption is violated by the C++ FE.  On the other hand, unsafe_for_reeval
is gone in 4.0 so I'm not sure we really care at this point.
Comment 36 Eric Botcazou 2005-03-10 08:04:39 UTC
New patch at http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00864.html
Comment 37 Gabriel Dos Reis 2005-11-21 02:21:06 UTC
(In reply to comment #36)
> New patch at http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00864.html
> 

Eric, do you still consider this problem important to be solved for 3.4.x?
Do you have a new version of your proposed patch?

-- Gaby
Comment 38 Eric Botcazou 2005-11-21 05:16:47 UTC
> Eric, do you still consider this problem important to be solved for 3.4.x?

As the saying goes in French: "ne soyons pas plus royaliste que le roi".  The C++ maintainers apparenty don't care much, then neither do I anymore.

> Do you have a new version of your proposed patch?

No, I've stopped working on this months ago.
Comment 39 Gabriel Dos Reis 2005-11-21 10:41:08 UTC
Fixed in 4.0.0 and higher.
Won't fix for 3.4.x