Bug 40436 - [4.5/4.6/4.7 regression] 0.5% code size regression caused by r147852
Summary: [4.5/4.6/4.7 regression] 0.5% code size regression caused by r147852
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.0
: P2 normal
Target Milestone: 4.5.4
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 40437 (view as bug list)
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2009-06-13 22:35 UTC by Richard Earnshaw
Modified: 2011-09-09 14:39 UTC (History)
8 users (show)

See Also:
Host:
Target: arm-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-06-19 00:00:34


Attachments
Proposed patch (303 bytes, patch)
2009-06-30 12:44 UTC, Jan Hubicka
Details | Diff
the patch richi was talking about (686 bytes, patch)
2010-03-29 10:14 UTC, Michael Matz
Details | Diff
preprocessed ext/super.c (67.68 KB, application/octet-stream)
2010-11-14 09:06 UTC, Jan Hubicka
Details
Preprocessed ialloc. (57.18 KB, application/octet-stream)
2010-11-14 11:06 UTC, Jan Hubicka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Earnshaw 2009-06-13 22:35:04 UTC
Rev 147852, which claims (according to http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00812.html) to improve code size, really makes things worse by ~0.5% for ARM, thumb and thumb2 code when building CSiBE.
Comment 1 Richard Biener 2009-06-13 22:51:15 UTC
*** Bug 40437 has been marked as a duplicate of this bug. ***
Comment 2 Andrew Pinski 2009-06-13 23:07:37 UTC
Hmm, the measurements are most likely on x86 which might have slight differences when it comes to code size differences than arm.
Comment 3 Richard Earnshaw 2009-06-13 23:10:15 UTC
For ARM -Os 114 files in CSiBE increase in size (total increase 21449 bytes)
20 files decrease in size (total decreases 1039 bytes); over all increase 20410 bytes)

Worst single increase is from bzip2/compress (increase from 12495 to 19463 bytes).
Comment 4 Ramana Radhakrishnan 2009-06-19 00:00:34 UTC
Confirmed that the problem exists.
Comment 5 Ramana Radhakrishnan 2009-06-19 23:14:25 UTC
The difference essentially is because all functions getting inlined into BZ2_Compress_Block with the newer heuristics. 

Comment 6 Jan Hubicka 2009-06-30 12:44:25 UTC
Created attachment 18100 [details]
Proposed patch

The problem is that early inliner allows to increase code size estimate by inlining single function by up to 12 instructions.  This is higher than on pretty-ipa branch still, since we are not that good on early optimizing yet and some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I hope that merging IPA-SRA and CD-DCE improvements will care this on all three benchmarks. At -O2 early inliner needs to be somewhat speculative since it don't know the function profiles yet. It however seems stupid to allow code size growth at -Os in general.

This patch changes it reducing compress.o from 14k to 9k on i386. I guess I will need to give it full CSiBE run since it is question whether little speculative inlining won't result in better overall score.

Honza
Comment 7 Jan Hubicka 2009-06-30 12:46:16 UTC
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

> The problem is that early inliner allows to increase code size estimate by
> inlining single function by up to 12 instructions.  This is higher than on
> pretty-ipa branch still, since we are not that good on early optimizing yet and
> some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by
> tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I
  ^^^^^^ pretty-ipa :)
> hope that merging IPA-SRA and CD-DCE improvements will care this on all three
> benchmarks. At -O2 early inliner needs to be somewhat speculative since it
> don't know the function profiles yet. It however seems stupid to allow code
> size growth at -Os in general.
Comment 8 Steven Bosscher 2009-06-30 13:14:19 UTC
Honza, I can take care of the CSiBE run if you want.
Comment 9 Jan Hubicka 2009-06-30 13:41:20 UTC
I can schedule it for nightly testing today, but if you have easier setup for CSiBE, it would help :)

Thanks!
Honza
Comment 10 Steven Bosscher 2009-06-30 19:55:57 UTC
I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown.
Comment 11 Jan Hubicka 2009-06-30 23:36:55 UTC
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

> I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown.
At -Os?

Honza
Comment 12 Steven Bosscher 2009-07-01 05:41:04 UTC
Yes, at -Os.
Comment 13 Jan Hubicka 2009-07-02 10:10:51 UTC
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

OK, on i386 it has some effect according to our nightly tester
it is 3524421->3510754.  The size used to be as low as 3431090
so it is just small improvement.  I guess I will commit the patch anyway
as it is quite obvious fix.

The other problem might be the "likely_eliminated_by_inlining_p"
predicate that is very optimisitic.

This predicate makes inliner to believe that all indirect reads and
writes to/from pointers passed to function or function parameters will
be optimized out.  This is important to allow inlining of methods and
SRAing out objects in C++ and devirtualizing calls, but for C code it is
bit too optimistic.

Partly this can be cured by IPA-SRA and Martin has WIP patch for
clonning that contains more fine grained analysis of function body size
specialized for given parameters. I however doubt they will catch all
the cases we need for C++.  Perhaps simply disabling the predicate for
-Os or making it just weak hint (removing some percentage of estimated
cost) is best way to go, I am just re-testing it on vangelis with size
estimates ignoring it.

Honza
Comment 14 Ramana Radhakrishnan 2009-12-09 17:09:25 UTC
(In reply to comment #13)

> I am just re-testing it on vangelis with size
> estimates ignoring it.

Honza - Any updates on this ?

Ramana
> 
> Honza
> 

Comment 15 Jan Hubicka 2010-03-07 20:37:43 UTC
I've been discussing this on IRC a while ago with Richard Guenther, but forgot to add a record.

It seems that for 4.5, it is best to leave inlining heruistics as it is.  THe code size regression come mainly from bzip that is excercising kind of typical "bad luck" scenario.  Other known problem in 4.5 inlining is tramp3d where we now deliver worse than best known performance, but still not worse than one of 4.4.

I spent some time playing with this and only way to get 4.5 inliner to solve these faults is to add new parameter that allows early inlining to inline forwarder functions that do increase code size estimates by small amount.  This helps to solve both tramp3d and bzip problems but also cause code size issues in some benchmarks (mostly not regressions over 4.4) and is quite ugly.

Since re-tunning heuristics needs significant amount of effort and it was done earlier in stage1 of 4.5 after merging changes from pretty-ipa, it seems better to leave as it is now. Also after LTO merge it seems, according to results posted by Vladimir Marakov, that the inliner is actually perfomring very well compared to other compilers.

For 4.6 I do have number of plans. With FRE in early optimization queue we invalidate need for some of early inlining and also IPA stuff should be making us less dependent on inling overall.

But I would propose this PR to be wontfix for 4.5.

Honza
Comment 16 Richard Biener 2010-03-28 15:46:45 UTC
(In reply to comment #15)
> I've been discussing this on IRC a while ago with Richard Guenther, but forgot
> to add a record.
> 
> It seems that for 4.5, it is best to leave inlining heruistics as it is.  THe
> code size regression come mainly from bzip that is excercising kind of typical
> "bad luck" scenario.  Other known problem in 4.5 inlining is tramp3d where we
> now deliver worse than best known performance, but still not worse than one of
> 4.4.
> 
> I spent some time playing with this and only way to get 4.5 inliner to solve
> these faults is to add new parameter that allows early inlining to inline
> forwarder functions that do increase code size estimates by small amount.  This
> helps to solve both tramp3d and bzip problems but also cause code size issues
> in some benchmarks (mostly not regressions over 4.4) and is quite ugly.
> 
> Since re-tunning heuristics needs significant amount of effort and it was done
> earlier in stage1 of 4.5 after merging changes from pretty-ipa, it seems better
> to leave as it is now. Also after LTO merge it seems, according to results
> posted by Vladimir Marakov, that the inliner is actually perfomring very well
> compared to other compilers.
> 
> For 4.6 I do have number of plans. With FRE in early optimization queue we
> invalidate need for some of early inlining and also IPA stuff should be making
> us less dependent on inling overall.
> 
> But I would propose this PR to be wontfix for 4.5.

Indeed.

There is also some miscounting of overall unit size, Micha has a patch for
that (but it completely chokes tramp3d results).  There is also the
early inliner cleanups I have done at some point.  Thus, I suppose we can
look at this early during 4.6 development again.
Comment 17 Jan Hubicka 2010-03-28 16:33:43 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> Indeed.
> 
> There is also some miscounting of overall unit size, Micha has a patch for
> that (but it completely chokes tramp3d results).  There is also the

Where is the patch?

> early inliner cleanups I have done at some point.  Thus, I suppose we can
> look at this early during 4.6 development again.

Well, those should not affect the resulting inlining, right?

Honza
Comment 18 rguenther@suse.de 2010-03-28 16:43:20 UTC
Subject: Re:  [4.5 regression] 0.5% code size
 regression caused by r147852

On Sun, 28 Mar 2010, hubicka at ucw dot cz wrote:

> ------- Comment #17 from hubicka at ucw dot cz  2010-03-28 16:33 -------
> Subject: Re:  [4.5 regression] 0.5% code size
>         regression caused by r147852
> 
> > Indeed.
> > 
> > There is also some miscounting of overall unit size, Micha has a patch for
> > that (but it completely chokes tramp3d results).  There is also the
> 
> Where is the patch?

Somewhere - you have to as Micha.

> > early inliner cleanups I have done at some point.  Thus, I suppose we can
> > look at this early during 4.6 development again.
> 
> Well, those should not affect the resulting inlining, right?

In theory not.  In practice it removes the iteration if I remember
correctly.

Richard.
Comment 19 Jan Hubicka 2010-03-28 16:56:28 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> > > There is also some miscounting of overall unit size, Micha has a patch for
> > > that (but it completely chokes tramp3d results).  There is also the
> > 
> > Where is the patch?
> 
> Somewhere - you have to as Micha.

I think I saw one but it was wrong.  I would be interested to at least know what this patch is about :)
> 
> 
> In theory not.  In practice it removes the iteration if I remember
> correctly.

Yes, but that should not affect the metrics (hopefully)
Anyway the 4.6 inliner will probably again behave quite differently - I want to get FRE
early, possibly get partial inlining done and we will have IPA AA that all affects effectivity
of inlining.

Honza
Comment 20 rguenther@suse.de 2010-03-28 17:00:14 UTC
Subject: Re:  [4.5 regression] 0.5% code size
 regression caused by r147852

On Sun, 28 Mar 2010, hubicka at ucw dot cz wrote:

> ------- Comment #19 from hubicka at ucw dot cz  2010-03-28 16:56 -------
> Subject: Re:  [4.5 regression] 0.5% code size
>         regression caused by r147852
> 
> > > > There is also some miscounting of overall unit size, Micha has a patch for
> > > > that (but it completely chokes tramp3d results).  There is also the
> > > 
> > > Where is the patch?
> > 
> > Somewhere - you have to as Micha.
> 
> I think I saw one but it was wrong.  I would be interested to at least know
> what this patch is about :)

It's about not accounting static called once functions twice when
decreasing overall unit size.

> > In theory not.  In practice it removes the iteration if I remember
> > correctly.
> 
> Yes, but that should not affect the metrics (hopefully)
> Anyway the 4.6 inliner will probably again behave quite differently - I want to
> get FRE
> early, possibly get partial inlining done and we will have IPA AA that all
> affects effectivity
> of inlining.

Of course.

Richard.
Comment 21 Jan Hubicka 2010-03-28 17:30:37 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> > I think I saw one but it was wrong.  I would be interested to at least know
> > what this patch is about :)
> 
> It's about not accounting static called once functions twice when
> decreasing overall unit size.
Ah, I saw that one and I think it was wrong (and that is also why it regressed
on tramp3d).  I will dig it out.

Honza
Comment 22 Michael Matz 2010-03-29 10:14:34 UTC
Created attachment 20247 [details]
the patch richi was talking about

No, you can't have seen it, I never posted it but held onto it for 4.6.
I created it in context of PR43058, where Richi noted the obviously
non-sensical inlining decisions.  The patch fixes two bugs:
1) overall_size is reduced twice for the same function, once in
   cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls
   the former), this leads to double accounting
2) cgraph_check_inline_limits checks the wrong size against the 
   PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size,
   not the one estimated after inlining.

(2) depends on how the PARAM_LARGE_FUNCTION_INSNS is interpreted, but (1)
clearly fixes a bug.
Comment 23 Jan Hubicka 2010-04-03 21:02:05 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> 1) overall_size is reduced twice for the same function, once in
>    cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls
>    the former), this leads to double accounting

Hmm, yep, it is bug here and I guess it makes tramp3d unhappy since it relies on
the overall unit growth.  I will try to fix this and retune to see if this can be
used to help the CSiBE regression that also might be related to this thinko.

> 2) cgraph_check_inline_limits checks the wrong size against the 
>    PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size,
>    not the one estimated after inlining.
Well, PARAM_LARGE_FUNCTION_INSNS was meant to let small functions to grow as much
as they want until the threshold is hit, so size after inlining makes sense here.

Honza
Comment 24 rguenther@suse.de 2010-04-03 21:13:21 UTC
Subject: Re:  [4.5 regression] 0.5% code size
 regression caused by r147852

On Sat, 3 Apr 2010, hubicka at ucw dot cz wrote:

> ------- Comment #23 from hubicka at ucw dot cz  2010-04-03 21:02 -------
> Subject: Re:  [4.5 regression] 0.5% code size
>         regression caused by r147852
> 
> > 1) overall_size is reduced twice for the same function, once in
> >    cgraph_clone_inlined_nodes, once in cgraph_mark_inline_edge (which calls
> >    the former), this leads to double accounting
> 
> Hmm, yep, it is bug here and I guess it makes tramp3d unhappy since it relies
> on
> the overall unit growth.  I will try to fix this and retune to see if this can
> be
> used to help the CSiBE regression that also might be related to this thinko.
> 
> > 2) cgraph_check_inline_limits checks the wrong size against the 
> >    PARAM_LARGE_FUNCTION_INSNS parameter, it needs to use the original size,
> >    not the one estimated after inlining.
> Well, PARAM_LARGE_FUNCTION_INSNS was meant to let small functions to grow as
> much
> as they want until the threshold is hit, so size after inlining makes sense
> here.

But the the code as-is allows unlimited growth of a function (well,
by PARAM_LARGE_FUNCTION_GROWTH for each inlining; the limit is
basically PARAM_LARGE_FUNCTION_INSNS * (1 + 
PARAM_LARGE_FUNCTION_GROWTH/100) ^ n with n being the number of
functions we inline into the function).  So it's not really a
limit of the growth but of the growth rate ;)

Richard.
Comment 25 Jan Hubicka 2010-04-03 21:19:33 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> But the the code as-is allows unlimited growth of a function (well,
> by PARAM_LARGE_FUNCTION_GROWTH for each inlining; the limit is
> basically PARAM_LARGE_FUNCTION_INSNS * (1 + 
> PARAM_LARGE_FUNCTION_GROWTH/100) ^ n with n being the number of

Size after inlining decide whether we want to apply PARAM_LARGE_FUNCTION_GROWTH
or not.  When we decide so (based on overall function size), the limit is
computed based on size without inlining.

So PARAM_LARGE_FUNCTION_INSNS only adds a buffer for small functions, but should
not interfere with the growth limits once function gets bigger than that.
(at least it is intended to work this way, I will double check the
implementation after returning)

Honza
Comment 26 Jan Hubicka 2010-04-03 21:20:54 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

As for history, I oriignally had only the perentage limits at place, but then
found that they behave really erratically on small units and small functions.
(especially in kernel source where you have tiny units with two functions that
needs to be inlined one to other).  So I added this large function/unit
thresholds.  They should be symmetric for functions and units.

Honza
Comment 27 Jan Hubicka 2010-04-03 21:39:00 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

And after checking the code, I think it is correct. I.e. limit is computed on
size before inlining of caller or callee (this is to allow large callees to be
inlined into tiny wrappers).

So overall limit for size after inlininig is  max(LARGE_FUNCTION_INSNS,
orig_size + orig_size * LARGE_FUNCTION_GROWTH) that seems sane to me.

Thinking more about the double accounting bug, it probably should not affect
the -Os compilation too much since we should not be inlining anything large at
first place.  Still will make a fix at monday and probably try to push it at
least to 4.5.1 or so.  It definitly allows to construct units where we let a
lot more growth than we are supposed so.

Honza
Comment 28 Michael Matz 2010-04-06 10:34:59 UTC
I don't think we should fix the double-accounting bug for the 4.5 series,
when we tried it on SPEC it caused several regression, meaning we would need
much more fine-tuning.  We have time for that for 4.6.
Comment 29 Jan Hubicka 2010-04-06 10:46:12 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> I don't think we should fix the double-accounting bug for the 4.5 series,
> when we tried it on SPEC it caused several regression, meaning we would need
> much more fine-tuning.  We have time for that for 4.6.

Well, we need to compensate for overall unit growth then, I will do so at
pretty-ipa so it will get some testing and see how much trouble it causes.
I am fine with this not being fixed in 4.5.x as long as we won't run into
real world testcase where double counting causes explosion of unit size.

Honza
Comment 30 Steven Bosscher 2010-04-06 10:56:24 UTC
I think it is a really, really bad signal if a bug like this, where the revision that introduced the issue was identified >9 months ago, remains unfixed for GCC 4.5.

I, for one, wouldn't care hunting down revisions that introduce regressions in stage1 anymore, if component maintainers and release managers just postpone fixing the issue until it is too late to fix for a release.
Comment 31 rguenther@suse.de 2010-04-06 11:00:00 UTC
Subject: Re:  [4.5 regression] 0.5% code size
 regression caused by r147852

On Tue, 6 Apr 2010, steven at gcc dot gnu dot org wrote:

> ------- Comment #30 from steven at gcc dot gnu dot org  2010-04-06 10:56 -------
> I think it is a really, really bad signal if a bug like this, where the
> revision that introduced the issue was identified >9 months ago, remains
> unfixed for GCC 4.5.
> 
> I, for one, wouldn't care hunting down revisions that introduce regressions in
> stage1 anymore, if component maintainers and release managers just postpone
> fixing the issue until it is too late to fix for a release.

Well, the fix doesn't fix this bug.  There's no patch to fix this
bug (yet).  Also 0.5% code size increase doesn't look important
enough to me.
Comment 32 Jan Hubicka 2010-04-06 11:05:03 UTC
Subject: Re:  [4.5 regression] 0.5% code size
	regression caused by r147852

> I think it is a really, really bad signal if a bug like this, where the
> revision that introduced the issue was identified >9 months ago, remains
> unfixed for GCC 4.5.

This particular bug should not affect -Os scores at all as discussed earlier.
Overall unit growth limits will never hit at -Os since we never enlarge unit
size. 

I spent fair amount of time tracking down the individual CSiBE code size
increases and fixed most of them.  At the moment, I believe, the only remaining
issue is bzip2 size growth.  The inlining decisions here are sane at the local
level as inliner sees them, just the resulting binary gets bigger. Inling is a
heuristic even at -Os and we can't expect it to work in all cases.  Here the
old inliner got lucky by simply not accounting any stores/loads as real
instructions allowing the functions to be inlined at letting later optimizers
to produce smaller code.

Honza
Comment 33 Michael Matz 2010-04-06 11:09:45 UTC
Steven, please note that this PR was proposed WONTFIX for 4.5 already in
comment #15.  The discussion after that was about something that is only
slightly related to this bug, something that wouldn't actually affect this
very bug.
Comment 34 Richard Biener 2010-04-06 11:20:17 UTC
GCC 4.5.0 is being released.  Deferring to 4.5.1.
Comment 35 Steven Bosscher 2010-04-06 11:32:37 UTC
If your discussions are only slightly related to this bug and don't affect -Os, then why are you having that discussion here?

Anyway. If this is WONTFIX for GCC 4.5, then it should be marked as such (remove "4.5 regression" from subject, open new PR for other issue, whatever).

Comment 36 Richard Biener 2010-07-31 09:29:29 UTC
GCC 4.5.1 is being released, adjusting target milestone.
Comment 37 Jan Hubicka 2010-11-10 02:35:23 UTC
Author: hubicka
Date: Wed Nov 10 02:35:19 2010
New Revision: 166517

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166517
Log:

	PR tree-optimization/40436
	* ipa-inline.c (leaf_node_p): Implement using is_inexpensive_builtin.
	* tree-inline.c (estimate_num_insns): Inexpensive builtins are like
	normal instructions; be sure bultin is not implemented in this file;
	compute non-zero return cost.
	(init_inline_once): Reduce builtin_call_cost to 1; set return cost.
	* tree-inline.h (eni_weights_d): Add return cost.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf-avx.c
    trunk/gcc/tree-inline.c
    trunk/gcc/tree-inline.h
Comment 38 Jan Hubicka 2010-11-11 22:08:31 UTC
Author: hubicka
Date: Thu Nov 11 22:08:26 2010
New Revision: 166624

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166624
Log:
	PR tree-optimize/40436
	* gcc.dg/tree-ssa/inline-5.c: New testcase.
	* gcc.dg/tree-ssa/inline-6.c: New testcase.

	* ipa-inline.c (likely_eliminated_by_inlining_p): Rename to ...
	(eliminated_by_inlining_prob): ... this one; return 50% probability for
	SRA.
	(estimate_function_body_sizes): Update use of eliminated_by_inlining_prob;
	estimate static function size for 2 instructions.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/inline-5.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/inline-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c
    trunk/gcc/testsuite/ChangeLog
Comment 39 Jan Hubicka 2010-11-13 18:42:01 UTC
I ran comparsion on x86-64 comparing mainline with GCC 4.3.
We get bigger sum of object sizes, but results are not too bad overall.
Main code size grains are:
222   jikespg-1.3src/remspb
247   OpenTCP-1.0.4smtp/smtp_clientb
248   linux-2.4.23-pre3-testplatformarch/testplatform/kernel/setupb
256   linux-2.4.23-pre3-testplatformlib/rbtreeb
261   teem-1.6.0-srcsrc/bane/hvolb
262   jikespg-1.3src/produceb
265   mpeg2dec-0.3.1libvo/yuv2rgbb
267   zlib-1.1.4treesb
280   linux-2.4.23-pre3-testplatformnet/ipv4/tcp_ipv4b
287   jikespg-1.3src/lpgparseb
290   teem-1.6.0-srcsrc/nrrd/reorderb
295   linux-2.4.23-pre3-testplatformarch/testplatform/mm/faultb
295   linux-2.4.23-pre3-testplatformlib/zlib_deflate/deftreeb
306   jpeg-6bjdcoefctb
311   linux-2.4.23-pre3-testplatformkernel/exitb
315   teem-1.6.0-srcsrc/nrrd/test/axb
320   linux-2.4.23-pre3-testplatformfs/nfs/procb
320   linux-2.4.23-pre3-testplatformnet/ipv4/tcpb
324   cg_compiler_opensrccompileb
337   flex-2.5.31miscb
368   teem-1.6.0-srcsrc/nrrd/ccb
374   teem-1.6.0-srcsrc/air/enumb
376   linux-2.4.23-pre3-testplatformfs/locksb
377   flex-2.5.31parseb
388   teem-1.6.0-srcsrc/gage/updateb
400   teem-1.6.0-srcsrc/gage/sclfilterb
414   OpenTCP-1.0.4pop3/pop3_clientb
417   teem-1.6.0-srcsrc/hest/parseHestb
421   teem-1.6.0-srcsrc/nrrd/filtb
432   libmspackmspack/lzxdb
433   cg_compiler_opensrctokensb
442   compilerscannerb
446   linux-2.4.23-pre3-testplatformkernel/schedb
480   linux-2.4.23-pre3-testplatformkernel/signalb
510   teem-1.6.0-srcsrc/gage/miscGageb
530   linux-2.4.23-pre3-testplatformfs/nfsd/statsb
532   teem-1.6.0-srcsrc/nrrd/supersetb
534   jikespg-1.3src/spacetabb
578   ttt-0.10.1.preprocsrc/connect4b
580   teem-1.6.0-srcsrc/echo/boundsb
601   cg_compiler_opensrcconstfoldb
699   jikespg-1.3src/globalsb
758   bzip2-1.0.2blocksortb
790   jpeg-6btransuppb
810   teem-1.6.0-srcsrc/nrrd/accessorsb
823   flex-2.5.31scanb
950   linux-2.4.23-pre3-testplatformnet/ipv4/routeb
1069   ttt-0.10.1.preprocsrc/tttb
1072   linux-2.4.23-pre3-testplatformkernel/sysb
1172   libmspackmspack/qtmdb
1453   linux-2.4.23-pre3-testplatformnet/ipv4/tcp_inputb
1643   bzip2-1.0.2compressb
2009   teem-1.6.0-srcsrc/nrrd/convertNrrdb
16144   linux-2.4.23-pre3-testplatformarch/testplatform/kernel/init_taskb

init_taskb is bogus:
union task_union init_task_union __attribute__((aligned(16384))) =
  { { state: 0, flags: 0, sigpending: 0, addr_limit: ((mm_segment_t) { (0) }), exec_domain: &default_exec_domain, lock_depth: -1, counter: (10*100/100), nice: (0), policy: 0, mm: ((void *)0), active_mm: &init_mm, cpus_runnable: ~0UL, cpus_allowed: ~0UL, run_list: { &(init_task_union.task.run_list), &(init_task_union.task.run_list) }, next_task: &init_task_union.task, prev_task: &init_task_union.task, p_opptr: &init_task_union.task, p_pptr: &init_task_union.task, thread_group: { &(init_task_union.task.thread_group), &(init_task_union.task.thread_group) }, wait_chldexit: { lock: (spinlock_t) { }, task_list: { &(init_task_union.task.wait_chldexit).task_list, &(init_task_union.task.wait_chldexit).task_list }, }, real_timer: { function: it_real_fn }, cap_effective: (~0 & ~(1 << (8))), cap_inheritable: (0), cap_permitted: (~0), keep_capabilities: 0, rlim: { { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (8*1024*1024), (~0UL) }, { 0, (~0UL) }, { (~0UL), (~0UL) }, { 0, 0 }, { 1024, 1024 }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, { (~0UL), (~0UL) }, }, user: (&root_user), comm: "swapper", thread: {{0,{{0},{0},{0},{0},{0},{0},{0},{0},{0},{0}, {0},{0},{0},{0},{0},{0}}}, 0, 0, sizeof((init_task_union.stack)) + (addr_t) &(init_task_union.stack), ((unsigned long)((addr_t) &swapper_pg_dir[0]) + (0x4|0x1|0x40|0x100)), 0,0,0, (per_struct) {{{{0,}}},0,0,0,0,{{0,}}}, 0, 0, 0 }, fs: &init_fs, files: &init_files, sigmask_lock: (spinlock_t) { }, sig: &init_signals, pending: { ((void *)0), &init_task_union.task.pending.head, {{0}}}, blocked: {{0}}, alloc_lock: (spinlock_t) { }, journal_info: ((void *)0), } };

tsk... Looks like GCC 4.3 ignored overly large alignments of variables.
Comment 40 Jan Hubicka 2010-11-13 19:02:03 UTC
Ahh, got the signs wrong. Those was main wins, main growths are
-3101   linux-2.4.23-pre3-testplatformfs/ext3/superb
-1622   compilervasb
-1597   linux-2.4.23-pre3-testplatformdrivers/char/n_ttyb
-1385   teem-1.6.0-srcsrc/limn/test/soidb
-1321   linux-2.4.23-pre3-testplatformmm/filemapb
-1287   linux-2.4.23-pre3-testplatformfs/ext3/iallocb
-1116   linux-2.4.23-pre3-testplatformfs/ext3/inodeb
-1057   teem-1.6.0-srcsrc/limn/test/tpsb
-1039   cg_compiler_opensrcsupportb
-941   teem-1.6.0-srcsrc/ell/quatb
-876   teem-1.6.0-srcsrc/nrrd/winKernelb
-861   teem-1.6.0-srcsrc/nrrd/measureb
-828   OpenTCP-1.0.4ethernetb
-805   linux-2.4.23-pre3-testplatformkernel/exec_domainb
-702   jikespg-1.3src/ctabsb
-701   linux-2.4.23-pre3-testplatformfs/nfsd/vfsb
-681   linux-2.4.23-pre3-testplatformfs/bufferb
-681   linux-2.4.23-pre3-testplatformmm/memoryb
-680   teem-1.6.0-srcsrc/echo/test/trendb
-646   teem-1.6.0-srcsrc/nrrd/kernelb
-613   linux-2.4.23-pre3-testplatformnet/core/devb
-613   linux-2.4.23-pre3-testplatformnet/ipv4/igmpb
-605   bzip2-1.0.2decompressb
-601   linux-2.4.23-pre3-testplatformfs/ext3/ballocb
-577   teem-1.6.0-srcsrc/echo/colorb
-574   linux-2.4.23-pre3-testplatformmm/shmemb
-514   compilerparserb
-511   linux-2.4.23-pre3-testplatformfs/nameib
-497   linux-2.4.23-pre3-testplatformfs/pipeb
-482   linux-2.4.23-pre3-testplatformfs/jbd/transactionb
-480   linux-2.4.23-pre3-testplatformfs/nfs/writeb
-473   libpng-1.2.5pngrtranb
-456   linux-2.4.23-pre3-testplatformfs/jbd/journalb
-417   linux-2.4.23-pre3-testplatformfs/namespaceb
-414   teem-1.6.0-srcsrc/ten/tendSatinb
-405   linux-2.4.23-pre3-testplatformnet/sunrpc/xprtb
-403   cg_compiler_opensrcprintutilsb
-403   teem-1.6.0-srcsrc/echo/intxb
-396   linux-2.4.23-pre3-testplatformarch/testplatform/kernel/debugb
-383   teem-1.6.0-srcsrc/mite/rayb
-379   linux-2.4.23-pre3-testplatformnet/sunrpc/schedb
-367   linux-2.4.23-pre3-testplatformnet/netlink/af_netlinkb
-362   linux-2.4.23-pre3-testplatformmm/mmapb
-359   linux-2.4.23-pre3-testplatformnet/ipv4/af_inetb
-356   linux-2.4.23-pre3-testplatformnet/ipv4/tcp_minisocksb
-342   linux-2.4.23-pre3-testplatformnet/sunrpc/svcsockb
-333   teem-1.6.0-srcsrc/limn/qnb
-328   libpng-1.2.5pngsetb
-328   linux-2.4.23-pre3-testplatformfs/lockd/svclockb
-327   linux-2.4.23-pre3-testplatformfs/nfsd/nfs3procb
-316   linux-2.4.23-pre3-testplatformdrivers/char/memb
-315   linux-2.4.23-pre3-testplatformnet/core/neighbourb
-307   compilercgb
-305   linux-2.4.23-pre3-testplatformlib/vsprintfb

Anyway that source bove is really bogus.
Comment 41 Jan Hubicka 2010-11-14 09:06:41 UTC
Created attachment 22389 [details]
preprocessed ext/super.c

Hi,
this testcase shows that we are no longer able to optimize away ext3_sops
in
 sb->s_magic = (__builtin_constant_p((__u16)((es->s_magic))) ? ({ __u16 __x = (((es->s_magic))); ((__u16)( (((__u16)(__x) & (__u16)0x00ffU) << 8) | (((__u16)(__x) & (__u16)0xff00U) >> 8) )); }) : __fswab16(((es->s_magic))));
 if (sb->s_magic != 0xEF53) {
  if (!silent)
   printk("<3>"
          "VFS: Can't find ext3 filesystem on dev %s.\n",
          bdevname(dev));
  goto failed_mount;
 }

The problem is that varpool code now expect that variable accesses are optimized out at the end of gimple queue instead of using TREE_SYMBOL_REFERENCED bit.  In this case sb->s_magic shomehow manages to get undefined and in GIMPLE land we keep the conditional around, while in RTL land init-regs initialize it to 0 that consequently makes mount to always fail.

I guess it is not real code quality bug since it happens only on undefined behaviour code, but we might consider doing initialization by zero sometime in gimple queue, too.

Also I am not quite sure if we are not misoptimizing the code, it is convoluted.
Comment 42 Jan Hubicka 2010-11-14 09:27:09 UTC
Fixing little bug in unreachable function removal and working around PR46470
gets me to:
-2933   linux-2.4.23-pre3-testplatformfs/ext3/superb  8069
-1572   linux-2.4.23-pre3-testplatformdrivers/char/n_ttyb  6038
-1385   teem-1.6.0-srcsrc/limn/test/soidb  5396
-1267   linux-2.4.23-pre3-testplatformfs/ext3/iallocb  2582
-1203   linux-2.4.23-pre3-testplatformmm/filemapb  15592
-1057   teem-1.6.0-srcsrc/limn/test/tpsb  4159
-1010   linux-2.4.23-pre3-testplatformfs/ext3/inodeb  12315
-843   teem-1.6.0-srcsrc/ell/quatb  11314
-834   cg_compiler_opensrcsupportb  27636
-828   OpenTCP-1.0.4ethernetb  1495
-820   teem-1.6.0-srcsrc/nrrd/winKernelb  16288
-807   teem-1.6.0-srcsrc/nrrd/measureb  9444
-677   jikespg-1.3src/ctabsb  48223
-656   teem-1.6.0-srcsrc/echo/test/trendb  9744
-619   linux-2.4.23-pre3-testplatformmm/memoryb  8051
-605   bzip2-1.0.2decompressb  8454
-595   teem-1.6.0-srcsrc/nrrd/kernelb  21446
-586   linux-2.4.23-pre3-testplatformfs/nfsd/vfsb  9082
-582   linux-2.4.23-pre3-testplatformfs/ext3/ballocb  4095
-572   teem-1.6.0-srcsrc/echo/colorb  7500
-546   linux-2.4.23-pre3-testplatformfs/bufferb  14617
-521   linux-2.4.23-pre3-testplatformnet/ipv4/igmpb  12205
-511   linux-2.4.23-pre3-testplatformnet/core/devb  11233
-462   libpng-1.2.5pngrtranb  21062
-427   linux-2.4.23-pre3-testplatformfs/nfs/writeb  6586
-419   linux-2.4.23-pre3-testplatformfs/nameib  11526
-414   teem-1.6.0-srcsrc/ten/tendSatinb  5479
-404   linux-2.4.23-pre3-testplatformfs/jbd/transactionb  10746
-387   teem-1.6.0-srcsrc/echo/intxb  10122
-380   teem-1.6.0-srcsrc/mite/rayb  3376
-357   linux-2.4.23-pre3-testplatformfs/jbd/journalb  9266
-349   cg_compiler_opensrcprintutilsb  12377
-340   linux-2.4.23-pre3-testplatformfs/namespaceb  5715
-317   teem-1.6.0-srcsrc/limn/qnb  2544
-311   linux-2.4.23-pre3-testplatformnet/ipv4/tcp_minisocksb  5169
-306   linux-2.4.23-pre3-testplatformnet/sunrpc/xprtb  8042
-291   linux-2.4.23-pre3-testplatformnet/netlink/af_netlinkb  5792
-289   linux-2.4.23-pre3-testplatformlib/vsprintfb  4266
-286   linux-2.4.23-pre3-testplatformnet/sunrpc/svcsockb  6488
-274   compilervasb  6631

What surprise me is that so far there was no inliner (or size estimate) related problems... Does the ARM regression reported still exist?
I suppose the fact that we do see regression at our x86_64 CSiBE testing might be just because of the unforunate 8 page alignments in the kernel.
Comment 43 Jan Hubicka 2010-11-14 11:06:59 UTC
Created attachment 22392 [details]
Preprocessed ialloc.

One of smaller units that grows a lot.  The culprint seems to be ext3_new_inode that is a lot smaller in GCC 4.3 variant. Not sure why, the inlining decisions seems sane.
Comment 44 Jan Hubicka 2010-11-14 16:16:35 UTC
OK, ialloc is because 4.3 folds:
  oldbit_430 = 0;
  D.12699_431 = oldbit_430 & 1;
  D.12698_462 = D.12699_431;
  D.12095_241 = D.12698_462;
  if (D.12095_241 != 0)
    goto <bb 71>;
  else
    goto <bb 72>;

In mainline the same sequence misses oldbit_430 = 0.

static __inline__ int
test_and_set_bit_simple(unsigned long nr, volatile void * addr)
{
 unsigned long reg1, reg2;
        int oldbit;

        return oldbit & 1;
}





static __inline__ int
test_and_clear_bit_simple(unsigned long nr, volatile void * addr)
{
 unsigned long reg1, reg2;
        int oldbit;


        return oldbit & 1;
}





static __inline__ int
test_and_change_bit_simple(unsigned long nr, volatile void * addr)
{
 unsigned long reg1, reg2;
        int oldbit;


        return oldbit & 1;
}

So another source code bug.
Richard, do you remember if we dropped initialization by zero for uninitialized vars?  

I am officially declaring kernel part of CSiBE irrelevant and will look at the other tests.

Honza
Comment 45 rguenther@suse.de 2010-11-14 17:38:46 UTC
On Sun, 14 Nov 2010, hubicka at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436
> 
> --- Comment #44 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-11-14 16:16:35 UTC ---
> OK, ialloc is because 4.3 folds:
>   oldbit_430 = 0;
>   D.12699_431 = oldbit_430 & 1;
>   D.12698_462 = D.12699_431;
>   D.12095_241 = D.12698_462;
>   if (D.12095_241 != 0)
>     goto <bb 71>;
>   else
>     goto <bb 72>;
> 
> In mainline the same sequence misses oldbit_430 = 0.
> 
> static __inline__ int
> test_and_set_bit_simple(unsigned long nr, volatile void * addr)
> {
>  unsigned long reg1, reg2;
>         int oldbit;
> 
>         return oldbit & 1;
> }
> 
> 
> 
> 
> 
> static __inline__ int
> test_and_clear_bit_simple(unsigned long nr, volatile void * addr)
> {
>  unsigned long reg1, reg2;
>         int oldbit;
> 
> 
>         return oldbit & 1;
> }
> 
> 
> 
> 
> 
> static __inline__ int
> test_and_change_bit_simple(unsigned long nr, volatile void * addr)
> {
>  unsigned long reg1, reg2;
>         int oldbit;
> 
> 
>         return oldbit & 1;
> }
> 
> So another source code bug.
> Richard, do you remember if we dropped initialization by zero for uninitialized
> vars?  

I don't even remember that we did that.  Btw, CCP should be able
to fold it with UNDEFINED given that & 1 cannot yield zero (but
we're very conservative here now due to past bugs ...)
Comment 46 Jan Hubicka 2010-11-19 08:18:11 UTC
I think CCP is not folding one way or another as it might lead to inconsistent control flows (i.e. one place assuming that the given undefined value is 0, other that 1). Initializing them to 0 is one consistent way to get stuff optimized out.  I am not biggest friend of that (I think they should stay undefined and liveness should handle it well) but givent hat RTL land does it anyway there is no point to bother.

Anyway I looked at the date for jump in CSiBE sizes in our tester.  It is May 30 2009
It coincide with Richard's EH unwind reorg
2009-05-29  Richard Henderson  <rth@redhat.com>

        * cfgcleanup.c (try_crossjump_to_edge): Only skip past
        NOTE_INSN_BASIC_BLOCK.
        * cfglayout.c (duplicate_insn_chain): Copy epilogue insn marks.
        Duplicate NOTE_INSN_EPILOGUE_BEG notes.
        * cfgrtl.c (can_delete_note_p): Allow NOTE_INSN_EPILOGUE_BEG
        to be deleted.
        * dwarf2out.c (struct cfa_loc): Change indirect field to bitfield,
        add in_use field.
        (add_cfi): Disable check redefining cfa away from drap.
        (lookup_cfa_1): Add remember argument; handle remember/restore.
        (lookup_cfa): Pass remember argument.
        (cfa_remember): New.

given that it makes tables asynchronous and it is thus correcntess issue and given that it is misaccounted as code size by "size" command and thus also CSiBE and given that I analyzed quite few testcases and found no inliner bugs, I think the bug no longer exists (-Os inliner was improved a lot).

It would be nice if someone tested it using the cross, for now I am putting it into waiting stage and will open new PRs for the few new issues I noticed.
Comment 47 Richard Biener 2010-12-16 13:03:00 UTC
GCC 4.5.2 is being released, adjusting target milestone.
Comment 48 Richard Biener 2011-04-28 14:51:14 UTC
GCC 4.5.3 is being released, adjusting target milestone.
Comment 49 Steven Bosscher 2011-09-09 14:39:52 UTC
Bug without a clear issue, and in WAITING state for >6 months -> closing.