This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] Speeding things up


>
>   Right now I'm working on addressing some severe performance
>   issues with the tree-ssa branch.  The issue in my gunsight right
>   now is the building of our may alias information.  For the testcase
>   I've got 20001226-1.c computing may alias takes about 30 minutes.
>
>   Fixing some algorithmic issues gets that down to around 13 minutes.

>
>   However, even after fixing the key algorithmic issues, we're still doing
>   2 billion silly calls that are totally unnecessary and account
>   for the top two and 5 of the top 10 cpu hogs.  You might think inlining
>   them is the answer.  Reality couldn't be further from the truth.

Here, you've got this many calls because you haven't actually fixed the
algorithmic issues. You just think you have, apparently.

Compute may aliases is *always* O(N^2) in the current form (and this is
the real cause of all the calls below) , because it's algorithmically bad.
It currently does:
	For every referenced variable a
		For every referenced variable b
			see if a aliases b

where "see if a aliases b" is what calls the functions below.

For 27,000 variables, thats 27000^2 = ~780 million calls.

For the subset of about 24000 variables we couldn't directly prove, we
call get_alias_set 24000^2 = 580 million times.

etc.

Nothing you do will make it any less O(N^2) as long as this is the main
loop.

It needs to walk points to sets PTA computes for each variable, not every
referenced variable for each variable.

it should be
	For every referenced variable
	    For each variable in the points-to set
		See if we can prove they don't alias through TBAA or
	        another method.
		otherwise, they alias

Since PTA computes a set of, on average, 2.6 variables a given variable
could point to, that would reduce the number of calls to about 80 or 90
thousand in each case.

Doing it this way will *never* be worse than the existing way, since
the largest set PTA can come up with is all variables.
However, it will be O(2.6 * N) in the average case, rather than O(N^2).

Also, this has nothing to do with inlining, since you *haven't* fixed
the algorithmic problem.
Once you've done that, if the overhead of calling is still a lot, then
inlining makes sense.

By the by, if you had mentioned this problem on the public mailing list,
you probably would have gotten a lot more help a lot quicker.
I've mentioned this problem to Diego before, as well.

>
>   get_base_symbol        780 million calls
>     Recursive function with a switch statement.
>
>   get_alias_set          520 million calls
>     Big and ugly.  Has recursive calls and callbacks
>     via function pointers into the front-ends
>
>   find_base_decl         520 million calls
>     Recursive function with a switch statement
>
>   c_common_get_alias_set 520 million calls
>     Called indirectly via get_alias_set
>
>   alias_sets_conflict_p  260 million calls
>     Not terribly bad, but more code than IMHO we should force inlined.
>
>
>   In all we're talking about over 2.6 billion function calls (and this is
>   *after* the algorithmic improvements).  For reference we really
>   need about 100k calls total.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]