This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])


On 12-10-03 11:35 AM, Steven Bosscher wrote:
On Wed, Oct 3, 2012 at 4:56 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
On 12-10-03 3:13 AM, Steven Bosscher wrote:
On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
+/* Compress pseudo live ranges by removing program points where
+   nothing happens.  Complexity of many algorithms in LRA is linear
+   function of program points number.  To speed up the code we try to
+   minimize the number of the program points here.  */
+static void
+remove_some_program_points_and_update_live_ranges (void)
Genuine question, but could we do this on the fly instead,
by not incrementing curr_point if the current point had no value?

I suppose the main complication would be checking cases where
all births are recorded by extending the previous just-closed live
range rather than starting a new one, in which case it's the previous
point that needs to be reused.  Hmm...
It does seem to be something worth investigating further. Things like:

Compressing live ranges: from 1742579 to 554532 - 31%
Compressing live ranges: from 1742569 to 73069 - 4%

look extreme, but they're actually the norm. For the same test case
(PR54146 still) but looking only at functions with initially 1000-9999
program points, you get this picture:

Compressing live ranges: from 1766 to 705 - 39%
Compressing live ranges: from 1449 to 370 - 25%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 2464 to 676 - 27%
Compressing live ranges: from 1433 to 379 - 26%
Compressing live ranges: from 1261 to 348 - 27%
Compressing live ranges: from 2835 to 755 - 26%
Compressing live ranges: from 5426 to 1678 - 30%
Compressing live ranges: from 5227 to 1477 - 28%
Compressing live ranges: from 1845 to 467 - 25%
Compressing live ranges: from 4868 to 1378 - 28%
Compressing live ranges: from 4875 to 1388 - 28%
Compressing live ranges: from 4882 to 1384 - 28%
Compressing live ranges: from 5688 to 1714 - 30%
Compressing live ranges: from 4943 to 1310 - 26%
Compressing live ranges: from 2976 to 792 - 26%
Compressing live ranges: from 5463 to 1526 - 27%
Compressing live ranges: from 2854 to 730 - 25%
Compressing live ranges: from 1810 to 745 - 41%
Compressing live ranges: from 2771 to 904 - 32%
Compressing live ranges: from 4916 to 1429 - 29%
Compressing live ranges: from 6505 to 2238 - 34%
Compressing live ranges: from 6493 to 166 - 2%
Compressing live ranges: from 5498 to 1734 - 31%
Compressing live ranges: from 1810 to 745 - 41%
Compressing live ranges: from 5043 to 1420 - 28%
Compressing live ranges: from 3094 to 788 - 25%
Compressing live ranges: from 4563 to 1311 - 28%
Compressing live ranges: from 4557 to 158 - 3%
Compressing live ranges: from 1050 to 274 - 26%
Compressing live ranges: from 1602 to 434 - 27%
Compressing live ranges: from 2474 to 600 - 24%
Compressing live ranges: from 2718 to 866 - 31%
Compressing live ranges: from 2097 to 716 - 34%
Compressing live ranges: from 4152 to 1099 - 26%
Compressing live ranges: from 5065 to 1514 - 29%
Compressing live ranges: from 1236 to 359 - 29%
Compressing live ranges: from 1722 to 541 - 31%
Compressing live ranges: from 5186 to 1401 - 27%

Unfortunately the dump doesn't mention how many live ranges could be
merged thanks to the compression.

It'd also be good to understand why the compression ratios are so
small, and consistently around ~30%.
This sentence is not clear to me.  30% means 3 times less points. It is
pretty good compression.

   Maybe curr_point includes things
it should ignore (DEBUG_INSNs, NOTEs, ...)?

After the compression, there are only points important for conflict info,
i.e. only points where some reg dies or start living.  Even more if on the
subsequent points there are only life starts or only deaths,  they are
represented by one point after the compression.
With this patch the amount of compression is reduced. Without the
patch the compression ratio is typically around 30%, with the patch
this goes up to ~65%. The worst compression ratios (where worse is
better in this case :-) are:

$ grep Compressing log4 | egrep " [78].%"
Compressing live ranges: from 31 to 22 - 70%, pre_count 17, post_count 15
Compressing live ranges: from 128 to 94 - 73%, pre_count 87, post_count 77
Compressing live ranges: from 32 to 23 - 71%, pre_count 16, post_count 15
Compressing live ranges: from 38 to 29 - 76%, pre_count 19, post_count 18
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 125 to 89 - 71%, pre_count 71, post_count 62
Compressing live ranges: from 10 to 8 - 80%, pre_count 6, post_count 6
Compressing live ranges: from 54 to 39 - 72%, pre_count 35, post_count 30
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114
Compressing live ranges: from 2804 to 2047 - 73%, pre_count 2932,
post_count 2841
Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114
Compressing live ranges: from 2184 to 1533 - 70%, pre_count 1458,
post_count 1366
Compressing live ranges: from 20 to 16 - 80%, pre_count 10, post_count 10

(pre_count and post_count are the number of live ranges before/after
compressing)

The "worst" result is this:
Compressing live ranges: from 726174 to 64496 - 8%, pre_count
40476128, post_count 12483414

But that's still a lot better than before the patch for the same function:
Compressing live ranges: from 1742569 to 73069 - 4%, pre_count
40842330, post_count 12479992

I'm not sure why I end up with fewer program points overall, I had
expected that remove_some_program_points_and_update_live_ranges would
make the post-compression numbers the same for the unpatched and
patched compiler. But oh well... There is no difference for >90% of
the cases so probably I just happen to trigger a few extra merges that
remove_some_program_points_and_update_live_ranges before the patch
somehow overlooked and kept disjoint.

Unfortunately, and much to my chagrin, there is almost no pay-off on
compile time. This is not completely unexpected, because the most
costly loops in e.g. remove_some_program_points_and_update_live_ranges
are unchanged (loop over all live ranges for all pseudos). Timings
before ("log") and after ("log4") the patch:

$ grep "LRA " log{,4} | egrep -o "^.*usr"
log: LRA non-specific        :  60.94 ( 5%) usr
log: LRA virtuals elimination:  60.24 ( 5%) usr
log: LRA reload inheritance  :   6.25 ( 1%) usr
log: LRA create live ranges  : 181.79 (16%) usr
log: LRA hard reg assignment : 132.33 (11%) usr
log: LRA coalesce pseudo regs:   2.48 ( 0%) usr
log4: LRA non-specific        :  60.52 ( 5%) usr
log4: LRA virtuals elimination:  62.44 ( 5%) usr
log4: LRA reload inheritance  :   6.45 ( 1%) usr
log4: LRA create live ranges  : 177.69 (15%) usr
log4: LRA hard reg assignment : 131.76 (11%) usr
log4: LRA coalesce pseudo regs:   2.60 ( 0%) usr

I was trying to make the
remove_some_program_points_and_update_live_ranges compression step
almost unnecessary (at least for the first iteration of
lra_create_live_ranges) because that would give a ~25% speedup for
"LRA create live ranges" for the PR54146 test case. But even with this
patch there are no functions for which
remove_some_program_points_and_update_live_ranges becomes a no-op :-(

Still, I would like to propose this for the lra-branch. Hopefully it's
possible to find out how to get closer to 100%, there is a potential
win there.

Bootstrapped and tested on x86_64-unknown-linux-gnu.
I also checked and double-checked that there are no code generation
differences on all pre-processed source files of cc1.

OK for lra-branch?

I was going to look at this code too but I was interesting in generation of less points and live ranges. It is strange that in my profiles, remove_some_program_points_and_update_live_ranges takes 0.6% of compiler time on these huge tests. So I was not interesting to speed up the function and may be therefore you have no visible change in compilation time.

I don't object the idea of the patch. I need some time to look at it (the different results on a function is a bit scary for me) and check simulator times on other tests.

Thanks for working on this issue, Steven.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]