Rev 147852, which claims (according to http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00812.html) to improve code size, really makes things worse by ~0.5% for ARM, thumb and thumb2 code when building CSiBE.
*** Bug 40437 has been marked as a duplicate of this bug. ***
Hmm, the measurements are most likely on x86 which might have slight differences when it comes to code size differences than arm.
For ARM -Os 114 files in CSiBE increase in size (total increase 21449 bytes) 20 files decrease in size (total decreases 1039 bytes); over all increase 20410 bytes) Worst single increase is from bzip2/compress (increase from 12495 to 19463 bytes).
Confirmed that the problem exists.
The difference essentially is because all functions getting inlined into BZ2_Compress_Block with the newer heuristics.
Created attachment 18100 [details] Proposed patch The problem is that early inliner allows to increase code size estimate by inlining single function by up to 12 instructions. This is higher than on pretty-ipa branch still, since we are not that good on early optimizing yet and some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I hope that merging IPA-SRA and CD-DCE improvements will care this on all three benchmarks. At -O2 early inliner needs to be somewhat speculative since it don't know the function profiles yet. It however seems stupid to allow code size growth at -Os in general. This patch changes it reducing compress.o from 14k to 9k on i386. I guess I will need to give it full CSiBE run since it is question whether little speculative inlining won't result in better overall score. Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > The problem is that early inliner allows to increase code size estimate by > inlining single function by up to 12 instructions. This is higher than on > pretty-ipa branch still, since we are not that good on early optimizing yet and > some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by > tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I ^^^^^^ pretty-ipa :) > hope that merging IPA-SRA and CD-DCE improvements will care this on all three > benchmarks. At -O2 early inliner needs to be somewhat speculative since it > don't know the function profiles yet. It however seems stupid to allow code > size growth at -Os in general.
Honza, I can take care of the CSiBE run if you want.
I can schedule it for nightly testing today, but if you have easier setup for CSiBE, it would help :) Thanks! Honza
I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown. At -Os? Honza
Yes, at -Os.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 OK, on i386 it has some effect according to our nightly tester it is 3524421->3510754. The size used to be as low as 3431090 so it is just small improvement. I guess I will commit the patch anyway as it is quite obvious fix. The other problem might be the "likely_eliminated_by_inlining_p" predicate that is very optimisitic. This predicate makes inliner to believe that all indirect reads and writes to/from pointers passed to function or function parameters will be optimized out. This is important to allow inlining of methods and SRAing out objects in C++ and devirtualizing calls, but for C code it is bit too optimistic. Partly this can be cured by IPA-SRA and Martin has WIP patch for clonning that contains more fine grained analysis of function body size specialized for given parameters. I however doubt they will catch all the cases we need for C++. Perhaps simply disabling the predicate for -Os or making it just weak hint (removing some percentage of estimated cost) is best way to go, I am just re-testing it on vangelis with size estimates ignoring it. Honza
(In reply to comment #13) > I am just re-testing it on vangelis with size > estimates ignoring it. Honza - Any updates on this ? Ramana > > Honza >
I've been discussing this on IRC a while ago with Richard Guenther, but forgot to add a record. It seems that for 4.5, it is best to leave inlining heruistics as it is. THe code size regression come mainly from bzip that is excercising kind of typical "bad luck" scenario. Other known problem in 4.5 inlining is tramp3d where we now deliver worse than best known performance, but still not worse than one of 4.4. I spent some time playing with this and only way to get 4.5 inliner to solve these faults is to add new parameter that allows early inlining to inline forwarder functions that do increase code size estimates by small amount. This helps to solve both tramp3d and bzip problems but also cause code size issues in some benchmarks (mostly not regressions over 4.4) and is quite ugly. Since re-tunning heuristics needs significant amount of effort and it was done earlier in stage1 of 4.5 after merging changes from pretty-ipa, it seems better to leave as it is now. Also after LTO merge it seems, according to results posted by Vladimir Marakov, that the inliner is actually perfomring very well compared to other compilers. For 4.6 I do have number of plans. With FRE in early optimization queue we invalidate need for some of early inlining and also IPA stuff should be making us less dependent on inling overall. But I would propose this PR to be wontfix for 4.5. Honza
(In reply to comment #15) > I've been discussing this on IRC a while ago with Richard Guenther, but forgot > to add a record. > > It seems that for 4.5, it is best to leave inlining heruistics as it is. THe > code size regression come mainly from bzip that is excercising kind of typical > "bad luck" scenario. Other known problem in 4.5 inlining is tramp3d where we > now deliver worse than best known performance, but still not worse than one of > 4.4. > > I spent some time playing with this and only way to get 4.5 inliner to solve > these faults is to add new parameter that allows early inlining to inline > forwarder functions that do increase code size estimates by small amount. This > helps to solve both tramp3d and bzip problems but also cause code size issues > in some benchmarks (mostly not regressions over 4.4) and is quite ugly. > > Since re-tunning heuristics needs significant amount of effort and it was done > earlier in stage1 of 4.5 after merging changes from pretty-ipa, it seems better > to leave as it is now. Also after LTO merge it seems, according to results > posted by Vladimir Marakov, that the inliner is actually perfomring very well > compared to other compilers. > > For 4.6 I do have number of plans. With FRE in early optimization queue we > invalidate need for some of early inlining and also IPA stuff should be making > us less dependent on inling overall. > > But I would propose this PR to be wontfix for 4.5. Indeed. There is also some miscounting of overall unit size, Micha has a patch for that (but it completely chokes tramp3d results). There is also the early inliner cleanups I have done at some point. Thus, I suppose we can look at this early during 4.6 development again.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > Indeed. > > There is also some miscounting of overall unit size, Micha has a patch for > that (but it completely chokes tramp3d results). There is also the Where is the patch? > early inliner cleanups I have done at some point. Thus, I suppose we can > look at this early during 4.6 development again. Well, those should not affect the resulting inlining, right? Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 On Sun, 28 Mar 2010, hubicka at ucw dot cz wrote: > ------- Comment #17 from hubicka at ucw dot cz 2010-03-28 16:33 ------- > Subject: Re: [4.5 regression] 0.5% code size > regression caused by r147852 > > > Indeed. > > > > There is also some miscounting of overall unit size, Micha has a patch for > > that (but it completely chokes tramp3d results). There is also the > > Where is the patch? Somewhere - you have to as Micha. > > early inliner cleanups I have done at some point. Thus, I suppose we can > > look at this early during 4.6 development again. > > Well, those should not affect the resulting inlining, right? In theory not. In practice it removes the iteration if I remember correctly. Richard.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > > > There is also some miscounting of overall unit size, Micha has a patch for > > > that (but it completely chokes tramp3d results). There is also the > > > > Where is the patch? > > Somewhere - you have to as Micha. I think I saw one but it was wrong. I would be interested to at least know what this patch is about :) > > > In theory not. In practice it removes the iteration if I remember > correctly. Yes, but that should not affect the metrics (hopefully) Anyway the 4.6 inliner will probably again behave quite differently - I want to get FRE early, possibly get partial inlining done and we will have IPA AA that all affects effectivity of inlining. Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 On Sun, 28 Mar 2010, hubicka at ucw dot cz wrote: > ------- Comment #19 from hubicka at ucw dot cz 2010-03-28 16:56 ------- > Subject: Re: [4.5 regression] 0.5% code size > regression caused by r147852 > > > > > There is also some miscounting of overall unit size, Micha has a patch for > > > > that (but it completely chokes tramp3d results). There is also the > > > > > > Where is the patch? > > > > Somewhere - you have to as Micha. > > I think I saw one but it was wrong. I would be interested to at least know > what this patch is about :) It's about not accounting static called once functions twice when decreasing overall unit size. > > In theory not. In practice it removes the iteration if I remember > > correctly. > > Yes, but that should not affect the metrics (hopefully) > Anyway the 4.6 inliner will probably again behave quite differently - I want to > get FRE > early, possibly get partial inlining done and we will have IPA AA that all > affects effectivity > of inlining. Of course. Richard.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > > I think I saw one but it was wrong. I would be interested to at least know > > what this patch is about :) > > It's about not accounting static called once functions twice when > decreasing overall unit size. Ah, I saw that one and I think it was wrong (and that is also why it regressed on tramp3d). I will dig it out. Honza
Created attachment 20247 [details] the patch richi was talking about No, you can't have seen it, I never posted it but held onto it for 4.6. I created it in context of PR43058, where Richi noted the obviously non-sensical inlining decisions. The patch fixes two bugs: 1) overall_size is reduced twice for the same function, once in cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls the former), this leads to double accounting 2) cgraph_check_inline_limits checks the wrong size against the PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size, not the one estimated after inlining. (2) depends on how the PARAM_LARGE_FUNCTION_INSNS is interpreted, but (1) clearly fixes a bug.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > 1) overall_size is reduced twice for the same function, once in > cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls > the former), this leads to double accounting Hmm, yep, it is bug here and I guess it makes tramp3d unhappy since it relies on the overall unit growth. I will try to fix this and retune to see if this can be used to help the CSiBE regression that also might be related to this thinko. > 2) cgraph_check_inline_limits checks the wrong size against the > PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size, > not the one estimated after inlining. Well, PARAM_LARGE_FUNCTION_INSNS was meant to let small functions to grow as much as they want until the threshold is hit, so size after inlining makes sense here. Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 On Sat, 3 Apr 2010, hubicka at ucw dot cz wrote: > ------- Comment #23 from hubicka at ucw dot cz 2010-04-03 21:02 ------- > Subject: Re: [4.5 regression] 0.5% code size > regression caused by r147852 > > > 1) overall_size is reduced twice for the same function, once in > > cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls > > the former), this leads to double accounting > > Hmm, yep, it is bug here and I guess it makes tramp3d unhappy since it relies > on > the overall unit growth. I will try to fix this and retune to see if this can > be > used to help the CSiBE regression that also might be related to this thinko. > > > 2) cgraph_check_inline_limits checks the wrong size against the > > PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size, > > not the one estimated after inlining. > Well, PARAM_LARGE_FUNCTION_INSNS was meant to let small functions to grow as > much > as they want until the threshold is hit, so size after inlining makes sense > here. But the the code as-is allows unlimited growth of a function (well, by PARAM_LARGE_FUNCTION_GROWTH for each inlining; the limit is basically PARAM_LARGE_FUNCTION_INSNS * (1 + PARAM_LARGE_FUNCTION_GROWTH/100) ^ n with n being the number of functions we inline into the function). So it's not really a limit of the growth but of the growth rate ;) Richard.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > But the the code as-is allows unlimited growth of a function (well, > by PARAM_LARGE_FUNCTION_GROWTH for each inlining; the limit is > basically PARAM_LARGE_FUNCTION_INSNS * (1 + > PARAM_LARGE_FUNCTION_GROWTH/100) ^ n with n being the number of Size after inlining decide whether we want to apply PARAM_LARGE_FUNCTION_GROWTH or not. When we decide so (based on overall function size), the limit is computed based on size without inlining. So PARAM_LARGE_FUNCTION_INSNS only adds a buffer for small functions, but should not interfere with the growth limits once function gets bigger than that. (at least it is intended to work this way, I will double check the implementation after returning) Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 As for history, I oriignally had only the perentage limits at place, but then found that they behave really erratically on small units and small functions. (especially in kernel source where you have tiny units with two functions that needs to be inlined one to other). So I added this large function/unit thresholds. They should be symmetric for functions and units. Honza
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 And after checking the code, I think it is correct. I.e. limit is computed on size before inlining of caller or callee (this is to allow large callees to be inlined into tiny wrappers). So overall limit for size after inlininig is max(LARGE_FUNCTION_INSNS, orig_size + orig_size * LARGE_FUNCTION_GROWTH) that seems sane to me. Thinking more about the double accounting bug, it probably should not affect the -Os compilation too much since we should not be inlining anything large at first place. Still will make a fix at monday and probably try to push it at least to 4.5.1 or so. It definitly allows to construct units where we let a lot more growth than we are supposed so. Honza
I don't think we should fix the double-accounting bug for the 4.5 series, when we tried it on SPEC it caused several regression, meaning we would need much more fine-tuning. We have time for that for 4.6.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > I don't think we should fix the double-accounting bug for the 4.5 series, > when we tried it on SPEC it caused several regression, meaning we would need > much more fine-tuning. We have time for that for 4.6. Well, we need to compensate for overall unit growth then, I will do so at pretty-ipa so it will get some testing and see how much trouble it causes. I am fine with this not being fixed in 4.5.x as long as we won't run into real world testcase where double counting causes explosion of unit size. Honza
I think it is a really, really bad signal if a bug like this, where the revision that introduced the issue was identified >9 months ago, remains unfixed for GCC 4.5. I, for one, wouldn't care hunting down revisions that introduce regressions in stage1 anymore, if component maintainers and release managers just postpone fixing the issue until it is too late to fix for a release.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 On Tue, 6 Apr 2010, steven at gcc dot gnu dot org wrote: > ------- Comment #30 from steven at gcc dot gnu dot org 2010-04-06 10:56 ------- > I think it is a really, really bad signal if a bug like this, where the > revision that introduced the issue was identified >9 months ago, remains > unfixed for GCC 4.5. > > I, for one, wouldn't care hunting down revisions that introduce regressions in > stage1 anymore, if component maintainers and release managers just postpone > fixing the issue until it is too late to fix for a release. Well, the fix doesn't fix this bug. There's no patch to fix this bug (yet). Also 0.5% code size increase doesn't look important enough to me.
Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > I think it is a really, really bad signal if a bug like this, where the > revision that introduced the issue was identified >9 months ago, remains > unfixed for GCC 4.5. This particular bug should not affect -Os scores at all as discussed earlier. Overall unit growth limits will never hit at -Os since we never enlarge unit size. I spent fair amount of time tracking down the individual CSiBE code size increases and fixed most of them. At the moment, I believe, the only remaining issue is bzip2 size growth. The inlining decisions here are sane at the local level as inliner sees them, just the resulting binary gets bigger. Inling is a heuristic even at -Os and we can't expect it to work in all cases. Here the old inliner got lucky by simply not accounting any stores/loads as real instructions allowing the functions to be inlined at letting later optimizers to produce smaller code. Honza
Steven, please note that this PR was proposed WONTFIX for 4.5 already in comment #15. The discussion after that was about something that is only slightly related to this bug, something that wouldn't actually affect this very bug.
GCC 4.5.0 is being released. Deferring to 4.5.1.
If your discussions are only slightly related to this bug and don't affect -Os, then why are you having that discussion here? Anyway. If this is WONTFIX for GCC 4.5, then it should be marked as such (remove "4.5 regression" from subject, open new PR for other issue, whatever).
GCC 4.5.1 is being released, adjusting target milestone.
Author: hubicka Date: Wed Nov 10 02:35:19 2010 New Revision: 166517 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166517 Log: PR tree-optimization/40436 * ipa-inline.c (leaf_node_p): Implement using is_inexpensive_builtin. * tree-inline.c (estimate_num_insns): Inexpensive builtins are like normal instructions; be sure bultin is not implemented in this file; compute non-zero return cost. (init_inline_once): Reduce builtin_call_cost to 1; set return cost. * tree-inline.h (eni_weights_d): Add return cost. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c trunk/gcc/tree-inline.c trunk/gcc/tree-inline.h
Author: hubicka Date: Thu Nov 11 22:08:26 2010 New Revision: 166624 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166624 Log: PR tree-optimize/40436 * gcc.dg/tree-ssa/inline-5.c: New testcase. * gcc.dg/tree-ssa/inline-6.c: New testcase. * ipa-inline.c (likely_eliminated_by_inlining_p): Rename to ... (eliminated_by_inlining_prob): ... this one; return 50% probability for SRA. (estimate_function_body_sizes): Update use of eliminated_by_inlining_prob; estimate static function size for 2 instructions. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/inline-5.c trunk/gcc/testsuite/gcc.dg/tree-ssa/inline-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline.c trunk/gcc/testsuite/ChangeLog
I ran comparsion on x86-64 comparing mainline with GCC 4.3. We get bigger sum of object sizes, but results are not too bad overall. Main code size grains are: 222 jikespg-1.3src/remspb 247 OpenTCP-1.0.4smtp/smtp_clientb 248 linux-2.4.23-pre3-testplatformarch/testplatform/kernel/setupb 256 linux-2.4.23-pre3-testplatformlib/rbtreeb 261 teem-1.6.0-srcsrc/bane/hvolb 262 jikespg-1.3src/produceb 265 mpeg2dec-0.3.1libvo/yuv2rgbb 267 zlib-1.1.4treesb 280 linux-2.4.23-pre3-testplatformnet/ipv4/tcp_ipv4b 287 jikespg-1.3src/lpgparseb 290 teem-1.6.0-srcsrc/nrrd/reorderb 295 linux-2.4.23-pre3-testplatformarch/testplatform/mm/faultb 295 linux-2.4.23-pre3-testplatformlib/zlib_deflate/deftreeb 306 jpeg-6bjdcoefctb 311 linux-2.4.23-pre3-testplatformkernel/exitb 315 teem-1.6.0-srcsrc/nrrd/test/axb 320 linux-2.4.23-pre3-testplatformfs/nfs/procb 320 linux-2.4.23-pre3-testplatformnet/ipv4/tcpb 324 cg_compiler_opensrccompileb 337 flex-2.5.31miscb 368 teem-1.6.0-srcsrc/nrrd/ccb 374 teem-1.6.0-srcsrc/air/enumb 376 linux-2.4.23-pre3-testplatformfs/locksb 377 flex-2.5.31parseb 388 teem-1.6.0-srcsrc/gage/updateb 400 teem-1.6.0-srcsrc/gage/sclfilterb 414 OpenTCP-1.0.4pop3/pop3_clientb 417 teem-1.6.0-srcsrc/hest/parseHestb 421 teem-1.6.0-srcsrc/nrrd/filtb 432 libmspackmspack/lzxdb 433 cg_compiler_opensrctokensb 442 compilerscannerb 446 linux-2.4.23-pre3-testplatformkernel/schedb 480 linux-2.4.23-pre3-testplatformkernel/signalb 510 teem-1.6.0-srcsrc/gage/miscGageb 530 linux-2.4.23-pre3-testplatformfs/nfsd/statsb 532 teem-1.6.0-srcsrc/nrrd/supersetb 534 jikespg-1.3src/spacetabb 578 ttt-0.10.1.preprocsrc/connect4b 580 teem-1.6.0-srcsrc/echo/boundsb 601 cg_compiler_opensrcconstfoldb 699 jikespg-1.3src/globalsb 758 bzip2-1.0.2blocksortb 790 jpeg-6btransuppb 810 teem-1.6.0-srcsrc/nrrd/accessorsb 823 flex-2.5.31scanb 950 linux-2.4.23-pre3-testplatformnet/ipv4/routeb 1069 ttt-0.10.1.preprocsrc/tttb 1072 linux-2.4.23-pre3-testplatformkernel/sysb 1172 libmspackmspack/qtmdb 1453 linux-2.4.23-pre3-testplatformnet/ipv4/tcp_inputb 1643 bzip2-1.0.2compressb 2009 teem-1.6.0-srcsrc/nrrd/convertNrrdb 16144 linux-2.4.23-pre3-testplatformarch/testplatform/kernel/init_taskb init_taskb is bogus: union task_union init_task_union __attribute__((aligned(16384))) = { { state: 0, flags: 0, sigpending: 0, addr_limit: ((mm_segment_t) { (0) }), exec_domain: &default_exec_domain, lock_depth: -1, counter: (10*100/100), nice: (0), policy: 0, mm: ((void *)0), active_mm: &init_mm, cpus_runnable: ~0UL, cpus_allowed: ~0UL, run_list: { &(init_task_union.task.run_list), &(init_task_union.task.run_list) }, next_task: &init_task_union.task, prev_task: &init_task_union.task, p_opptr: &init_task_union.task, p_pptr: &init_task_union.task, thread_group: { &(init_task_union.task.thread_group), &(init_task_union.task.thread_group) }, wait_chldexit: { lock: (spinlock_t) { }, task_list: { &(init_task_union.task.wait_chldexit).task_list, &(init_task_union.task.wait_chldexit).task_list }, }, real_timer: { function: it_real_fn }, cap_effective: (~0 & ~(1 << (8))), cap_inheritable: (0), cap_permitted: (~0), keep_capabilities: 0, rlim: { { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (8*1024*1024), (~0UL) }, { 0, (~0UL) }, { (~0UL), (~0UL) }, { 0, 0 }, { 1024, 1024 }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, }, user: (&root_user), comm: "swapper", thread: {{0,{{0},{0},{0},{0},{0},{0},{0},{0},{0},{0}, {0},{0},{0},{0},{0},{0}}}, 0, 0, sizeof((init_task_union.stack)) + (addr_t) &(init_task_union.stack), ((unsigned long)((addr_t) &swapper_pg_dir[0]) + (0x4|0x1|0x40|0x100)), 0,0,0, (per_struct) {{{{0,}}},0,0,0,0,{{0,}}}, 0, 0, 0 }, fs: &init_fs, files: &init_files, sigmask_lock: (spinlock_t) { }, sig: &init_signals, pending: { ((void *)0), &init_task_union.task.pending.head, {{0}}}, blocked: {{0}}, alloc_lock: (spinlock_t) { }, journal_info: ((void *)0), } }; tsk... Looks like GCC 4.3 ignored overly large alignments of variables.
Ahh, got the signs wrong. Those was main wins, main growths are -3101 linux-2.4.23-pre3-testplatformfs/ext3/superb -1622 compilervasb -1597 linux-2.4.23-pre3-testplatformdrivers/char/n_ttyb -1385 teem-1.6.0-srcsrc/limn/test/soidb -1321 linux-2.4.23-pre3-testplatformmm/filemapb -1287 linux-2.4.23-pre3-testplatformfs/ext3/iallocb -1116 linux-2.4.23-pre3-testplatformfs/ext3/inodeb -1057 teem-1.6.0-srcsrc/limn/test/tpsb -1039 cg_compiler_opensrcsupportb -941 teem-1.6.0-srcsrc/ell/quatb -876 teem-1.6.0-srcsrc/nrrd/winKernelb -861 teem-1.6.0-srcsrc/nrrd/measureb -828 OpenTCP-1.0.4ethernetb -805 linux-2.4.23-pre3-testplatformkernel/exec_domainb -702 jikespg-1.3src/ctabsb -701 linux-2.4.23-pre3-testplatformfs/nfsd/vfsb -681 linux-2.4.23-pre3-testplatformfs/bufferb -681 linux-2.4.23-pre3-testplatformmm/memoryb -680 teem-1.6.0-srcsrc/echo/test/trendb -646 teem-1.6.0-srcsrc/nrrd/kernelb -613 linux-2.4.23-pre3-testplatformnet/core/devb -613 linux-2.4.23-pre3-testplatformnet/ipv4/igmpb -605 bzip2-1.0.2decompressb -601 linux-2.4.23-pre3-testplatformfs/ext3/ballocb -577 teem-1.6.0-srcsrc/echo/colorb -574 linux-2.4.23-pre3-testplatformmm/shmemb -514 compilerparserb -511 linux-2.4.23-pre3-testplatformfs/nameib -497 linux-2.4.23-pre3-testplatformfs/pipeb -482 linux-2.4.23-pre3-testplatformfs/jbd/transactionb -480 linux-2.4.23-pre3-testplatformfs/nfs/writeb -473 libpng-1.2.5pngrtranb -456 linux-2.4.23-pre3-testplatformfs/jbd/journalb -417 linux-2.4.23-pre3-testplatformfs/namespaceb -414 teem-1.6.0-srcsrc/ten/tendSatinb -405 linux-2.4.23-pre3-testplatformnet/sunrpc/xprtb -403 cg_compiler_opensrcprintutilsb -403 teem-1.6.0-srcsrc/echo/intxb -396 linux-2.4.23-pre3-testplatformarch/testplatform/kernel/debugb -383 teem-1.6.0-srcsrc/mite/rayb -379 linux-2.4.23-pre3-testplatformnet/sunrpc/schedb -367 linux-2.4.23-pre3-testplatformnet/netlink/af_netlinkb -362 linux-2.4.23-pre3-testplatformmm/mmapb -359 linux-2.4.23-pre3-testplatformnet/ipv4/af_inetb -356 linux-2.4.23-pre3-testplatformnet/ipv4/tcp_minisocksb -342 linux-2.4.23-pre3-testplatformnet/sunrpc/svcsockb -333 teem-1.6.0-srcsrc/limn/qnb -328 libpng-1.2.5pngsetb -328 linux-2.4.23-pre3-testplatformfs/lockd/svclockb -327 linux-2.4.23-pre3-testplatformfs/nfsd/nfs3procb -316 linux-2.4.23-pre3-testplatformdrivers/char/memb -315 linux-2.4.23-pre3-testplatformnet/core/neighbourb -307 compilercgb -305 linux-2.4.23-pre3-testplatformlib/vsprintfb Anyway that source bove is really bogus.
Created attachment 22389 [details] preprocessed ext/super.c Hi, this testcase shows that we are no longer able to optimize away ext3_sops in sb->s_magic = (__builtin_constant_p((__u16)((es->s_magic))) ? ({ __u16 __x = (((es->s_magic))); ((__u16)( (((__u16)(__x) & (__u16)0x00ffU) << 8) | (((__u16)(__x) & (__u16)0xff00U) >> 8) )); }) : __fswab16(((es->s_magic)))); if (sb->s_magic != 0xEF53) { if (!silent) printk("<3>" "VFS: Can't find ext3 filesystem on dev %s.\n", bdevname(dev)); goto failed_mount; } The problem is that varpool code now expect that variable accesses are optimized out at the end of gimple queue instead of using TREE_SYMBOL_REFERENCED bit. In this case sb->s_magic shomehow manages to get undefined and in GIMPLE land we keep the conditional around, while in RTL land init-regs initialize it to 0 that consequently makes mount to always fail. I guess it is not real code quality bug since it happens only on undefined behaviour code, but we might consider doing initialization by zero sometime in gimple queue, too. Also I am not quite sure if we are not misoptimizing the code, it is convoluted.
Fixing little bug in unreachable function removal and working around PR46470 gets me to: -2933 linux-2.4.23-pre3-testplatformfs/ext3/superb 8069 -1572 linux-2.4.23-pre3-testplatformdrivers/char/n_ttyb 6038 -1385 teem-1.6.0-srcsrc/limn/test/soidb 5396 -1267 linux-2.4.23-pre3-testplatformfs/ext3/iallocb 2582 -1203 linux-2.4.23-pre3-testplatformmm/filemapb 15592 -1057 teem-1.6.0-srcsrc/limn/test/tpsb 4159 -1010 linux-2.4.23-pre3-testplatformfs/ext3/inodeb 12315 -843 teem-1.6.0-srcsrc/ell/quatb 11314 -834 cg_compiler_opensrcsupportb 27636 -828 OpenTCP-1.0.4ethernetb 1495 -820 teem-1.6.0-srcsrc/nrrd/winKernelb 16288 -807 teem-1.6.0-srcsrc/nrrd/measureb 9444 -677 jikespg-1.3src/ctabsb 48223 -656 teem-1.6.0-srcsrc/echo/test/trendb 9744 -619 linux-2.4.23-pre3-testplatformmm/memoryb 8051 -605 bzip2-1.0.2decompressb 8454 -595 teem-1.6.0-srcsrc/nrrd/kernelb 21446 -586 linux-2.4.23-pre3-testplatformfs/nfsd/vfsb 9082 -582 linux-2.4.23-pre3-testplatformfs/ext3/ballocb 4095 -572 teem-1.6.0-srcsrc/echo/colorb 7500 -546 linux-2.4.23-pre3-testplatformfs/bufferb 14617 -521 linux-2.4.23-pre3-testplatformnet/ipv4/igmpb 12205 -511 linux-2.4.23-pre3-testplatformnet/core/devb 11233 -462 libpng-1.2.5pngrtranb 21062 -427 linux-2.4.23-pre3-testplatformfs/nfs/writeb 6586 -419 linux-2.4.23-pre3-testplatformfs/nameib 11526 -414 teem-1.6.0-srcsrc/ten/tendSatinb 5479 -404 linux-2.4.23-pre3-testplatformfs/jbd/transactionb 10746 -387 teem-1.6.0-srcsrc/echo/intxb 10122 -380 teem-1.6.0-srcsrc/mite/rayb 3376 -357 linux-2.4.23-pre3-testplatformfs/jbd/journalb 9266 -349 cg_compiler_opensrcprintutilsb 12377 -340 linux-2.4.23-pre3-testplatformfs/namespaceb 5715 -317 teem-1.6.0-srcsrc/limn/qnb 2544 -311 linux-2.4.23-pre3-testplatformnet/ipv4/tcp_minisocksb 5169 -306 linux-2.4.23-pre3-testplatformnet/sunrpc/xprtb 8042 -291 linux-2.4.23-pre3-testplatformnet/netlink/af_netlinkb 5792 -289 linux-2.4.23-pre3-testplatformlib/vsprintfb 4266 -286 linux-2.4.23-pre3-testplatformnet/sunrpc/svcsockb 6488 -274 compilervasb 6631 What surprise me is that so far there was no inliner (or size estimate) related problems... Does the ARM regression reported still exist? I suppose the fact that we do see regression at our x86_64 CSiBE testing might be just because of the unforunate 8 page alignments in the kernel.
Created attachment 22392 [details] Preprocessed ialloc. One of smaller units that grows a lot. The culprint seems to be ext3_new_inode that is a lot smaller in GCC 4.3 variant. Not sure why, the inlining decisions seems sane.
OK, ialloc is because 4.3 folds: oldbit_430 = 0; D.12699_431 = oldbit_430 & 1; D.12698_462 = D.12699_431; D.12095_241 = D.12698_462; if (D.12095_241 != 0) goto <bb 71>; else goto <bb 72>; In mainline the same sequence misses oldbit_430 = 0. static __inline__ int test_and_set_bit_simple(unsigned long nr, volatile void * addr) { unsigned long reg1, reg2; int oldbit; return oldbit & 1; } static __inline__ int test_and_clear_bit_simple(unsigned long nr, volatile void * addr) { unsigned long reg1, reg2; int oldbit; return oldbit & 1; } static __inline__ int test_and_change_bit_simple(unsigned long nr, volatile void * addr) { unsigned long reg1, reg2; int oldbit; return oldbit & 1; } So another source code bug. Richard, do you remember if we dropped initialization by zero for uninitialized vars? I am officially declaring kernel part of CSiBE irrelevant and will look at the other tests. Honza
On Sun, 14 Nov 2010, hubicka at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436 > > --- Comment #44 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-11-14 16:16:35 UTC --- > OK, ialloc is because 4.3 folds: > oldbit_430 = 0; > D.12699_431 = oldbit_430 & 1; > D.12698_462 = D.12699_431; > D.12095_241 = D.12698_462; > if (D.12095_241 != 0) > goto <bb 71>; > else > goto <bb 72>; > > In mainline the same sequence misses oldbit_430 = 0. > > static __inline__ int > test_and_set_bit_simple(unsigned long nr, volatile void * addr) > { > unsigned long reg1, reg2; > int oldbit; > > return oldbit & 1; > } > > > > > > static __inline__ int > test_and_clear_bit_simple(unsigned long nr, volatile void * addr) > { > unsigned long reg1, reg2; > int oldbit; > > > return oldbit & 1; > } > > > > > > static __inline__ int > test_and_change_bit_simple(unsigned long nr, volatile void * addr) > { > unsigned long reg1, reg2; > int oldbit; > > > return oldbit & 1; > } > > So another source code bug. > Richard, do you remember if we dropped initialization by zero for uninitialized > vars? I don't even remember that we did that. Btw, CCP should be able to fold it with UNDEFINED given that & 1 cannot yield zero (but we're very conservative here now due to past bugs ...)
I think CCP is not folding one way or another as it might lead to inconsistent control flows (i.e. one place assuming that the given undefined value is 0, other that 1). Initializing them to 0 is one consistent way to get stuff optimized out. I am not biggest friend of that (I think they should stay undefined and liveness should handle it well) but givent hat RTL land does it anyway there is no point to bother. Anyway I looked at the date for jump in CSiBE sizes in our tester. It is May 30 2009 It coincide with Richard's EH unwind reorg 2009-05-29 Richard Henderson <rth@redhat.com> * cfgcleanup.c (try_crossjump_to_edge): Only skip past NOTE_INSN_BASIC_BLOCK. * cfglayout.c (duplicate_insn_chain): Copy epilogue insn marks. Duplicate NOTE_INSN_EPILOGUE_BEG notes. * cfgrtl.c (can_delete_note_p): Allow NOTE_INSN_EPILOGUE_BEG to be deleted. * dwarf2out.c (struct cfa_loc): Change indirect field to bitfield, add in_use field. (add_cfi): Disable check redefining cfa away from drap. (lookup_cfa_1): Add remember argument; handle remember/restore. (lookup_cfa): Pass remember argument. (cfa_remember): New. given that it makes tables asynchronous and it is thus correcntess issue and given that it is misaccounted as code size by "size" command and thus also CSiBE and given that I analyzed quite few testcases and found no inliner bugs, I think the bug no longer exists (-Os inliner was improved a lot). It would be nice if someone tested it using the cross, for now I am putting it into waiting stage and will open new PRs for the few new issues I noticed.
GCC 4.5.2 is being released, adjusting target milestone.
GCC 4.5.3 is being released, adjusting target milestone.
Bug without a clear issue, and in WAITING state for >6 months -> closing.