This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])


On 10/04/2012 03:24 AM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 8:57 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
I was going to look at this code too but I was interesting in generation of
less points and live ranges.  It is strange that in my profiles,
remove_some_program_points_and_update_live_ranges takes 0.6% of compiler
time on these huge tests.   So I was not interesting to speed up the
function and may be therefore you have no visible change in compilation
time.
Right. The compression algorithm doesn't care much about the initial
number of program points, only about the number of live ranges before
and after compression. I had expected a bigger effect on the number of
live ranges before compression.

0.6% sounds really very different from my timings. How much time does
create_start_finish_chains take for you?


I don't object the idea of the patch.  I need some time to look at it (the
different results on a function is a bit scary for me) and check simulator
times on other tests.
Understood.
BTW, it would be great if you can also look at this additional patch hunk:

@@ -994,8 +1044,8 @@ lra_create_live_ranges (bool all_p)
    curr_point = 0;
    point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2);
    lra_point_freq = VEC_address (int, point_freq_vec);
-  FOR_EACH_BB (bb)
-    process_bb_lives (bb);
+  FOR_EACH_BB_REVERSE (bb)
+    process_bb_lives (bb, curr_point);
    lra_live_max_point = curr_point;
    create_start_finish_chains ();
    if (lra_dump_file != NULL)

I think this should result in more live ranges being merged. Here's
why I think so, based on my far worse understanding of this code than
yours, so forgive me if I'm Completely Wrong :-)
No, you are not wrong.
Two days ago, I worked on patch which contains the same code. The patch actually takes EBB into account to decrease # calls of mark_pseudo_live at the beginning of process_bb_lives and mark_pseudo_dead at the function end and for that I needed FOR_EACH_BB_REVERSE. The patch was half baked (it did not checked hard regs live changes at the end of BB to set up right hard reg conflicts for pseudos) but it gave an idea how much I can get from this. It is not bad but not what I expected. So I stopped work on this. But we still should work on these ideas as they improve LRA speed in small steps (many small steps will create a visible effect).


We can really solve scalability problem only by using simpler but still good enough algorithms (too simple algorithms result in big code size and actually even in worse compilation times). I've been working on it and I'll send a patch soon.
process_bb_lives walks insns in the basic block from last to first, so
say you have a basic block chain 1->2->3, and each block has 4 insns,
then AFAIU the program points in block 1 will be [4,3,2,1], in block 2
it will be [8,7,6,5], and in block 3 it will be [12,11,10,9]. Say a
reg is used in block 3 at point 11, and set in block at point 3. Then
this reg will have a live range chain [3-1],[8-5],[12-11].

If you visit the basic blocks in reverse order, the program points
will be: 1:[12,11,10,9], 2:[8,7,6,5], 3:[4,3,2,1]. Now the same reg
will be set at point 11 and used at point 3, and the live range chain
will be just [11-3].




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]