[PATCH] Modulo-scheduling improvements. Patch 2 of 2.

Mircea Namolaru NAMOLARU@il.ibm.com
Thu Mar 22 16:27:00 GMT 2007


Hello Andrey,

We looked at your changes - they are fine. 

The path was tested successfully (bootstrap + regression testing) on a 
PowerPC.
Checking now also on a Cell SPU processor  where there is a doloop pattern 
without PARALLEL.

Vladimir & Mircea
 

>
> FWIW, I'd like to comment on how the patches behave on ia64.  We were 
> trying to fix SMS to run it on ia64.  Two fixes were needed:  removal of 

>   SMS profitability estimation, which is done by the first patch of this 

> series, and correct generation of loop counter decrement in 
> generate_prolog_epilog.  On top of these two patches, we did the 
> following to correct these and related problems:
> 
> >  doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
> ...
> > !   if (!INSN_P (PREV_INSN (insn)))
> > !     return NULL_RTX;
> > !
> 


> Here, you wanted to support doloop patterns without PARALLEL; however, 
> this spoils the support for PARALLEL doloops.  We had a doloop where the 

> jump was preceded with NOTE_INSN_DELETED; this loop failed that check 
> and was not recognized as a doloop.  I have fixed the problem here; you 
> may also want to remove this check and fix doloop_condition_get so that 
> it'll skip notes within a loop when finding an increment insn.
> 
> 
> >    /* Generate the prolog, inserting its insns on the loop-entry edge. 
 */
> >    start_sequence ();
> > 
> > !   if (count_reg && !count_init)
> >     /* Generate a subtract instruction at the beginning of the prolog 
to
> >        adjust the loop count by STAGE_COUNT.  */
> > !    emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> > !                                            count_reg,
> > !                                            GEN_INT (-last_stage)));
> 
> Here we used expand_simple_binop instead of emit_move_insn, which 
> doesn't work on ia64 (there is no way to generate this within a single 
> insn).
> 
> > !
> > 
> >    for (i = 0; i < last_stage; i++)
> >      duplicate_insns_of_cycles (ps, 0, i, 1);
> > 
> > +    /* Remove the decrements of the count_reg from the prolog.
> > +       It has been already adjusted.  */
> > +    insn = get_insns ();
> > +    if (count_reg && !count_init)
> > +      insn = NEXT_INSN (insn);
> 
> And here, you were assuming again that the decrement will be a single 
> insn.  In our case, the following loop killed the decrement we've just 
> created.
> 
> FWIW, the attached patch on top of your fixes allows gcc to bootstrap 
> with BOOT_CFLAGS="-O2 -fmodulo-sched" on ia64.  It also passes a test 
> run of SPEC CPU 2000 except gcc (for which I don't have fixed 64-bit 
> compatible sources), crafty and galgel (those also fail without 
> -fmodulo-sched).  Could you test this on ppc?
> 
> Andrey
> --- tmp/modulo-sched.c   2007-03-13 16:01:38.000000000 +0300
> +++ sms-fixes/gcc/modulo-sched.c   2007-03-15 16:28:53.000000000 +0300
> @@ -279,7 +279,10 @@ doloop_register_get (rtx insn ATTRIBUTE_
>    if (!JUMP_P (insn))
>      return NULL_RTX;
> 
> -  if (!INSN_P (PREV_INSN (insn)))
> +  /* When a jump is not a PARALLEL, we expect a previous insn 
> +     to be an increment.  */
> +  if (GET_CODE (PATTERN (insn)) != PARALLEL 
> +      && !INSN_P (PREV_INSN (insn)))
>      return NULL_RTX;
> 
>    condition = doloop_condition_get (insn);
> @@ -648,38 +651,46 @@ generate_prolog_epilog (partial_schedule
>    int i;
>    int last_stage = PS_STAGE_COUNT (ps) - 1;
>    edge e;
> -  rtx insn;
> - 
> +  rtx insn, last_sub_insn = NULL_RTX;
> 
>    /* Generate the prolog, inserting its insns on the loop-entry edge. 
*/
>    start_sequence ();
> 
>    if (count_reg && !count_init)
> -   /* Generate a subtract instruction at the beginning of the prolog to
> -      adjust the loop count by STAGE_COUNT.  */
> -   emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> -                                           count_reg,
> -                                           GEN_INT (-last_stage)));
> +    {
> +      rtx sub_reg = NULL_RTX;
> + 
> +      sub_reg = expand_simple_binop (GET_MODE (count_reg), MINUS,
> +                                     count_reg, GEN_INT (last_stage), 
> +                                     count_reg, 1, OPTAB_DIRECT);
> +      gcc_assert (REG_P (sub_reg));
> +      if (REGNO (sub_reg) != REGNO (count_reg))
> +        emit_move_insn (count_reg, sub_reg);
> 
> +      /* Save the last insn generated so we'd search for excessive 
> +         decrements below starting from it.  */
> +      last_sub_insn = get_insns ();
> +      while (NEXT_INSN (last_sub_insn))
> +        last_sub_insn = NEXT_INSN (last_sub_insn);
> +    }
> 
>    for (i = 0; i < last_stage; i++)
>      duplicate_insns_of_cycles (ps, 0, i, 1);
> 
> -   /* Remove the decrements of the count_reg from the prolog.
> -      It has been already adjusted.  */
> -   insn = get_insns ();
> -   if (count_reg && !count_init)
> -     insn = NEXT_INSN (insn);
> -   while (insn)
> -     {
> -       if (INSN_P (insn) && single_set (insn) 
> -           && rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> -         insn = delete_insn (insn);
> -       else
> -         insn = NEXT_INSN (insn);
> -     }
> -
> -
> +  /* Remove the decrements of the count_reg from the prolog.
> +     It has been already adjusted.  */
> +  insn = get_insns ();
> +  if (count_reg && !count_init)
> +    insn = NEXT_INSN (last_sub_insn);
> +  while (insn)
> +    {
> +      if (INSN_P (insn) && single_set (insn) 
> +          && rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> +        insn = delete_insn (insn);
> +      else
> +        insn = NEXT_INSN (insn);
> +    }
> + 
>    /* Put the prolog on the entry edge.  */
>    e = loop_preheader_edge (loop);
>    split_edge_and_insert (e, get_insns ());



More information about the Gcc-patches mailing list