This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches


------- Additional Comments From paolo dot bonzini at lu dot unisi dot ch  2005-08-17 20:07 -------
Subject: Re:  [meta-bug] optimizations that CSE still
 catches


>>    unsigned outcnt;
>>    extern void flush_outbuf(void);
>>
>>    void
>>    bi_windup(unsigned char *outbuf, unsigned char bi_buf)
>>    {
>>        outbuf[outcnt] = bi_buf;
>>        if (outcnt == 16384)
>>                flush_outbuf();
>>        outbuf[outcnt] = bi_buf;
>>    }
>>    
>>
>Presumably the store into outbuf prevents the SSA optimizers from
>commonizing the first two loads of outcnt and the call to flush_outbuf
>prevents the SSA optimizers from commonizing the last load of outcnt on
>the path which bypasses the call to flush_outbuf.  Right?
>  
>
Not really.  First of all, as stevenb pointed out on IRC, this is quite 
specific to powerpc-apple-darwin and other targets where programs are 
compiled as PIC by default.  Steven's SPEC testing under Linux has not 
shown this behavior, but shared libraries there *will* suffer from the 
same problem!

We'd want the code to become

    void
    bi_windup(unsigned char *outbuf, unsigned char bi_buf)
    {
        int t1 = outcnt;
        outbuf[t1] = bi_buf;
        int t2 = outcnt, t3;
        if (t2 == 16384) {
                flush_outbuf();
		t3 = outcnt;
	} else
		t3 = t2;
        outbuf[t3] = bi_buf;
    }


If we disable CSE path following, and keep only one GCSE pass, we 
"waste" the opportunity to do this optimization, because we generate 
temporaries for the partially redundant address of outcnt.  With two 
GCSE passes, the second is able to eliminate the partially redundant load.

Of course what we really miss is load PRE on the tree level, but it is 
good that --param max-gcse-passes=2 can be a replacement of 
-fcse-skip-blocks -fcse-follow-jumps.  Testing mainline GCC against a 
patch including no path following + 2 GCSE passes + my forward 
propagation pass, I'm seeing SPEC improvements of +2 to +8% on 
powerpc-apple-darwin.

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]