I normally run nightly spec cpu2000 benchmark testing with main line GCC. I've seen few performance drops since July 29 2005. Currently I'm analysing 197.parser benchmark which had 15 points drop since the previous run on 28th July. I've located following two patches 1) http://gcc.gnu.org/ml/gcc-cvs/2005-07/msg01016.html - 8 point drop 2) http://gcc.gnu.org/ml/gcc-cvs/2005-07/msg01034.html - 7 point drop and verified that these patches have caused the drop. After looking at the 1st patch, I found that the new assignment statement (node->global.estimated_growth = INT_MIN) in function update_caller_keys made the difference. The rest of the patch didn't matter. I've not studied the 2nd patch yet. With only first patch applied, I found four (4) of the object files in 197.parser benchmark are different in code fragment and size compare to one without the patch. Those files are as follows, post-process.o: +260 bytes with patch compare to without patch prune.o: -672 bytes with patch read-dict.o : -336 bytes with patch utilities.o : +80 bytes Following is the difference of object code fragment of post-process.o with and without patch. WITH THE PATCH WITHOUT PATCH <post_process>: <post_process>: mflr r3 mflr r3 stwu r1,-576(r1) stwu r1,-576(r1) lis r9,0 lis r9,0 stw r3,580(r1) stw r3,580(r1) stw r18,520(r1) stw r18,520(r1) stw r19,524(r1) stw r19,524(r1) stw r20,528(r1) stw r20,528(r1) stw r21,532(r1) stw r21,532(r1) stw r22,536(r1) stw r22,536(r1) stw r23,540(r1) stw r23,540(r1) stw r24,544(r1) stw r24,544(r1) stw r25,548(r1) stw r25,548(r1) stw r26,552(r1) stw r26,552(r1) stw r27,556(r1) stw r27,556(r1) stw r28,560(r1) stw r28,560(r1) stw r29,564(r1) stw r29,564(r1) stw r30,568(r1) stw r30,568(r1) stw r31,572(r1) stw r31,572(r1) lwz r0,0(r9) lwz r0,0(r9) cmpwi cr7,r0,0 cmpwi cr7,r0,0 bne- cr7,5a9c <post_process+0x1ec> bne- cr7,5a9c <post_process+0x1ec> lis r29,0 lis r29,0 li r3,8 li r3,8 bl 590c <post_process+0x5c> bl 590c <post_process+0x5c> lwz r4,0(r29) lwz r4,0(r29) mr r22,r3 | mr r23,r3 rlwinm r3,r4,2,0,29 rlwinm r3,r4,2,0,29 bl 591c <post_process+0x6c> bl 591c <post_process+0x6c> lwz r11,0(r29) lwz r11,0(r29) stw r3,0(r22) | stw r3,0(r23) cmpwi r11,0 cmpwi r11,0 ble- 5a48 <post_process+0x198> ble- 5a48 <post_process+0x198> li r31,1 li r31,1 li r30,0 li r30,0 addi r10,r11,-1 addi r10,r11,-1 cmpw cr6,r31,r11 | cmpw cr1,r31,r11 stw r30,0(r3) stw r30,0(r3) clrlwi r0,r10,29 clrlwi r0,r10,29 beq- cr6,5a48 <post_process+0x198> | beq- cr1,5a48 <post_process+0x198> cmpwi cr7,r0,0 | cmpwi r0,0 beq- cr7,59dc <post_process+0x12c> | beq- 59dc <post_process+0x12c> cmpwi r0,1 | cmpwi cr7,r0,1 beq- 59c8 <post_process+0x118> | beq- cr7,59c8 <post_process+0x118> cmpwi cr1,r0,2 | cmpwi cr6,r0,2 beq- cr1,59bc <post_process+0x10c> | beq- cr6,59bc <post_process+0x10c> cmpwi cr6,r0,3 | cmpwi cr1,r0,3 beq- cr6,59b0 <post_process+0x100> | beq- cr1,59b0 <post_process+0x100> cmpwi cr7,r0,4 | cmpwi r0,4 beq- cr7,59a4 <post_process+0xf4> | beq- 59a4 <post_process+0xf4> stw r30,4(r3) stw r30,4(r3) li r31,2 li r31,2 rlwinm r21,r31,2,0,29 | rlwinm r22,r31,2,0,29 addi r31,r31,1 addi r31,r31,1 stwx r30,r21,r3 | stwx r30,r22,r3 rlwinm r19,r31,2,0,29 | rlwinm r18,r31,2,0,29 addi r31,r31,1 addi r31,r31,1 I've lot more data. I've also taken the dump with -fdump-ipa-cgraph of the benchmark with and without patch. I'll add it later if need it. Thanks.
With the latest (05-19-2005) mainline cvs tree, following are the benchmark numbers with and without ipa-inline patch (http://gcc.gnu.org/ml/gcc-cvs/2005-07/msg01016.html) compiled with flags "-O3 -m32 -mcpu=power4 -ffast-math -fpeel-loops -ftree-loop-linear -funroll-loops" Benchmark with_patch without_patch 164.gzip 401.82 404.29 175.vpr 513.52 514.93 176.gcc 677.31 682.95 181.mcf 733.14 735.16 186.crafty 492.28 493.37 197.parser 423.06 430.35 252.eon 529.79 536.55 253.perlbmk 361.01 365.51 254.gap 455.51 459.00 255.vortex 625.22 611.80 256.bzip2 536.58 535.21 300.wolf 709.13 709.86
Both paches are affecting inlining decisions and it looks like parser somehow got unlucky on PPC (they didn't cause similar regression on parser for AMD64). It would be very useful to know what function inlininig changed and caused the difference. The inlining decisions can be dumped with -fdump-ipa-all Honza
Created attachment 9827 [details] dump-ipa-all output of the affected source files
Created attachment 9828 [details] dump-ipa-all output of the affected source files without the inlining patch
I've marked this as P2. We should try to understand the problem, but inlining heuristics are notoriously hard to get right, so it's hard to be sure whether we're seeing a real bug in the compiler, or just a situation where we got lucky before.
With the latest mainline, the performance numbers for parser benchmark are very close to the reported numbers (on July 29th 2005). With the mainline, parser numbers on powerpc64-linux with "-O3 -m32 -mcpu=power4 -ffast-math -fpeel-loops -ftree-loop-linear -funroll-loops" flags is, 197.parser: 438 (and the reported drop was at 423). Looking at the current numbers I don't think I should investigate this any further. I'm thinking of closing this bug as FIXED. Any thoughts?
Reporter says this is fixed, and nobody seems to disagree.