This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add zero-overhead looping for xtensa backend


Hi Sterling,

    Since the patch is delayed for a long time, I'm kind of pushing it. Sorry for that. 
    Yeah, you are right. We have some performance issue here as GCC may use one more general register in some cases with this patch. 
    Take the following arraysum testcase for example. In doloop optimization, GCC figures out that the number of iterations is 1024 and creates a new pseudo 79 as the new trip count register. 
    The pseudo 79 is live throughout the loop, this makes the register pressure in the loop higher. And it's possible that this new pseudo is spilled by reload when the register pressure is very high. 
    I know that the xtensa loop instruction copies the trip count register into the LCOUNT special register. And we need describe this hardware feature in GCC in order to free the trip count register. 
    But I find it difficult to do. Do you have any good suggestions on this? 

arraysum.c:
int g[1024];
int g_sum;

void test_entry ()
{
        int i, Sum = 0;

        for (i = 0; i < 1024; i++)
          Sum = Sum + g[i];

        g_sum = Sum;
}


1. RTL before the doloop optimization pass(arraysum.c.193r.loop2_invariant):
(note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 32 34 36 2 NOTE_INSN_FUNCTION_BEG)
(insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ])
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [2  S4 A32])) 29 {movsi_internal}
     (expr_list:REG_EQUAL (symbol_ref:SI ("g")  <var_decl 0x7f6eef5d62d0 g>)
        (nil)))
(insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ])
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC3") [flags 0x2]) [2  S4 A32])) 29 {movsi_internal}
     (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI ("g")  <var_decl 0x7f6eef5d62d0 g>)
                (const_int 4096 [0x1000])))
        (nil)))
(insn 33 37 42 2 (set (reg/v:SI 74 [ Sum ])
        (const_int 0 [0])) arraysum.c:6 29 {movsi_internal}
     (nil))
(code_label 42 33 38 3 2 "" [0 uses])
(note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
        (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 A32])) arraysum.c:9 29 {movsi_internal}
     (nil))
(insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ])
        (plus:SI (reg/v:SI 74 [ Sum ])
            (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 {addsi3}
     (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
        (nil)))
(insn 41 40 43 3 (set (reg:SI 72 [ ivtmp$8 ])
        (plus:SI (reg:SI 72 [ ivtmp$8 ])
            (const_int 4 [0x4]))) 1 {addsi3}
     (nil))
(jump_insn 43 41 52 3 (set (pc)
        (if_then_else (ne (reg:SI 72 [ ivtmp$8 ])
                (reg/f:SI 76 [ D.1393 ]))
            (label_ref:SI 52)
            (pc))) arraysum.c:8 39 {*btrue}
     (int_list:REG_BR_PROB 9899 (nil))
 -> 52)
(code_label 52 43 51 5 3 "" [1 uses])
(note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 45 44 46 4 (set (reg/f:SI 78)
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC4") [flags 0x2]) [2  S4 A32])) arraysum.c:11 29 {movsi_internal}
     (expr_list:REG_EQUAL (symbol_ref:SI ("g_sum")  <var_decl 0x7f6eef5d6360 g_sum>)
        (nil)))
(insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32])
        (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 78)
        (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ])
            (nil))))


2. RTL after the doloop optimization pass(arraysum.c.195r.loop2_doloop):
(note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 32 34 36 2 NOTE_INSN_FUNCTION_BEG)
(insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ])
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [2  S4 A32])) 29 {movsi_internal}
     (expr_list:REG_EQUAL (symbol_ref:SI ("g")  <var_decl 0x7f6eef5d62d0 g>)
        (nil)))
(insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ])
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC3") [flags 0x2]) [2  S4 A32])) 29 {movsi_internal}
     (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI ("g")  <var_decl 0x7f6eef5d62d0 g>)
                (const_int 4096 [0x1000])))
        (nil)))
(insn 33 37 54 2 (set (reg/v:SI 74 [ Sum ])
        (const_int 0 [0])) arraysum.c:6 29 {movsi_internal}
     (nil))
(insn 54 33 42 2 (set (reg:SI 79)
        (const_int 1024 [0x400])) arraysum.c:6 -1
     (nil))
(code_label 42 54 38 3 2 "" [0 uses])
(note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
        (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 A32])) arraysum.c:9 29 {movsi_internal}
     (nil))
(insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ])
        (plus:SI (reg/v:SI 74 [ Sum ])
            (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 {addsi3}
     (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
        (nil)))
(insn 41 40 53 3 (set (reg:SI 72 [ ivtmp$8 ])
        (plus:SI (reg:SI 72 [ ivtmp$8 ])
            (const_int 4 [0x4]))) 1 {addsi3}
     (nil))
(jump_insn 53 41 52 3 (parallel [
            (set (pc)
                (if_then_else (ne (reg:SI 79)
                        (const_int 1 [0x1]))
                    (label_ref 52)
                    (pc)))
            (set (reg:SI 79)
                (plus:SI (reg:SI 79)
                    (const_int -1 [0xffffffffffffffff])))
            (unspec [
                    (const_int 0 [0])
                ] 13)
            (clobber (scratch:SI))
        ]) -1
     (int_list:REG_BR_PROB 9899 (nil))
 -> 52)
(code_label 52 53 51 5 3 "" [1 uses])
(note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 45 44 46 4 (set (reg/f:SI 78)
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC4") [flags 0x2]) [2  S4 A32])) arraysum.c:11 29 {movsi_internal}
     (expr_list:REG_EQUAL (symbol_ref:SI ("g_sum")  <var_decl 0x7f6eef5d6360 g_sum>)
        (nil)))
(insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32])
        (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 78)
        (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ])
            (nil))))


> 
> On Tue, Oct 14, 2014 at 8:39 AM, Felix Yang <fei.yang0953@gmail.com> wrote:
> > PINGï
> > Cheers,
> > Felix
> 
> Felix,
> 
> This isn't my day job, 24-hour pings are unproductive.
> 
> You shouldn't need to worry about the trip count register getting spilled. It
> makes no difference whatsoever to how the loop operates--the trip count is
> dead with regards to the loop once the instruction executes. You don't need to
> describe LCOUNT to gcc in order for this not to matter. It should be enough to
> describe the zcl as consuming the value in the same way a branch instruction
> consumes a value.
> 
> If you have a case where spilling it is causing a problem, then there is a bug in
> your code, papered over by dropping case when it is spilled. Similarly with
> iter_reg_used_outside--it shouldn't affect whether or not a zcl is valid here. If
> you have a case where it does, there is likely a bug in your code.
> 
> If the code is easier to write by maintaining trip_count up, then fine (for now);
> you give up some performance (in fact, a lot of performance), but that doesn't
> matter as to the correctness.
> 
> 
> >
> >
> > On Tue, Oct 14, 2014 at 12:30 AM, Felix Yang <fei.yang0953@gmail.com>
> wrote:
> >> Thanks for the comments.
> >>
> >> The patch checked the usage of teh trip count register, making sure
> >> that it is not used in the loop body other than the doloop_end or
> >> lives past the doloop_end instruction, as the following code snippet
> >> shows:
> >>
> >> +  /* Scan all the blocks to make sure they don't use iter_reg.  */
> >> + if (loop->iter_reg_used || loop->iter_reg_used_outside)
> >> +    {
> >> +      if (dump_file)
> >> +        fprintf (dump_file, ";; loop %d uses iterator\n",
> >> +                 loop->loop_no);
> >> +      return false;
> >> +    }
> >>
> >>     For the spill issue, I think we need to handle it. The reason is
> >> that currently we are not telling GCC about the existence of the
> >> LCOUNT register. Instead, we keep the trip count in a general
> >> register and it's possible that this register can be spilled when
> >> register pressure is high.
> >>     It's a good idea to post another patch to describe the LCOUNT
> >> register in GCC in order to free this general register. But I want
> >> this patch applied as a first step, OK?
> >>
> >> Cheers,
> >> Felix
> >>
> >>
> >> On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterling@gmail.com
> >> <augustine.sterling@gmail.com> wrote:
> >>> On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang <fei.yang0953@gmail.com>
> wrote:
> >>>> Hi Sterling,
> >>>>
> >>>>     I made some improvement to the patch. Two changes:
> >>>>     1. TARGET_LOOPS is now used as a condition of the doloop
> >>>> related patterns, which is more elegant.
> >>>
> >>> Fine.
> >>>
> >>>>     2. As the trip count register of the zero-cost loop maybe
> >>>> potentially spilled, we need to change the patterns in order to
> >>>> handle this issue.
> >>>
> >>> Actually, for xtensa you don't. The trip count is copied into LCOUNT
> >>> at the execution of the loop instruction, and therefore a spill or
> >>> whatever doesn't matter--it won't affect the result. So as long as
> >>> you have the trip count at the start of the loop, you are fine.
> >>>
> >>> This does bring up an issue of whether or not the trip count can be
> >>> modified during the loop. (note that this is different than early
> >>> exit.) If it can, you can't use a zero-overhead loop. Does your
> >>> patch address this case.
> >>>
> >>> The solution is similar to that adapted by c6x backend.
> >>>> Just turn the zero-cost loop into a regular loop when that happens
> >>>> when reload is completed.
> >>>>     Attached please find version 4 of the patch. Make check
> >>>> regression tested with xtensa-elf-gcc/simulator.
> >>>>     OK for trunk?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]