This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
- From: Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- To: Andrey Belevantsev <abel at ispras dot ru>
- Cc: Ayal Zaks <zaks at il dot ibm dot com>, Dorit Nuzman <dorit at il dot ibm dot com>, gcc-patches at gcc dot gnu dot org, Vladimir Yanovsky <yanov at il dot ibm dot com>, vmakarov at redhat dot com, Vladimir Yanovsky <volodyan at gmail dot com>
- Date: Thu, 22 Mar 2007 17:45:17 +0200
- Subject: Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
Hello Andrey,
We looked at your changes - they are fine.
The path was tested successfully (bootstrap + regression testing) on a
PowerPC.
Checking now also on a Cell SPU processor where there is a doloop pattern
without PARALLEL.
Vladimir & Mircea
>
> FWIW, I'd like to comment on how the patches behave on ia64. We were
> trying to fix SMS to run it on ia64. Two fixes were needed: removal of
> SMS profitability estimation, which is done by the first patch of this
> series, and correct generation of loop counter decrement in
> generate_prolog_epilog. On top of these two patches, we did the
> following to correct these and related problems:
>
> > doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
> ...
> > ! if (!INSN_P (PREV_INSN (insn)))
> > ! return NULL_RTX;
> > !
>
> Here, you wanted to support doloop patterns without PARALLEL; however,
> this spoils the support for PARALLEL doloops. We had a doloop where the
> jump was preceded with NOTE_INSN_DELETED; this loop failed that check
> and was not recognized as a doloop. I have fixed the problem here; you
> may also want to remove this check and fix doloop_condition_get so that
> it'll skip notes within a loop when finding an increment insn.
>
>
> > /* Generate the prolog, inserting its insns on the loop-entry edge.
*/
> > start_sequence ();
> >
> > ! if (count_reg && !count_init)
> > /* Generate a subtract instruction at the beginning of the prolog
to
> > adjust the loop count by STAGE_COUNT. */
> > ! emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> > ! count_reg,
> > ! GEN_INT (-last_stage)));
>
> Here we used expand_simple_binop instead of emit_move_insn, which
> doesn't work on ia64 (there is no way to generate this within a single
> insn).
>
> > !
> >
> > for (i = 0; i < last_stage; i++)
> > duplicate_insns_of_cycles (ps, 0, i, 1);
> >
> > + /* Remove the decrements of the count_reg from the prolog.
> > + It has been already adjusted. */
> > + insn = get_insns ();
> > + if (count_reg && !count_init)
> > + insn = NEXT_INSN (insn);
>
> And here, you were assuming again that the decrement will be a single
> insn. In our case, the following loop killed the decrement we've just
> created.
>
> FWIW, the attached patch on top of your fixes allows gcc to bootstrap
> with BOOT_CFLAGS="-O2 -fmodulo-sched" on ia64. It also passes a test
> run of SPEC CPU 2000 except gcc (for which I don't have fixed 64-bit
> compatible sources), crafty and galgel (those also fail without
> -fmodulo-sched). Could you test this on ppc?
>
> Andrey
> --- tmp/modulo-sched.c 2007-03-13 16:01:38.000000000 +0300
> +++ sms-fixes/gcc/modulo-sched.c 2007-03-15 16:28:53.000000000 +0300
> @@ -279,7 +279,10 @@ doloop_register_get (rtx insn ATTRIBUTE_
> if (!JUMP_P (insn))
> return NULL_RTX;
>
> - if (!INSN_P (PREV_INSN (insn)))
> + /* When a jump is not a PARALLEL, we expect a previous insn
> + to be an increment. */
> + if (GET_CODE (PATTERN (insn)) != PARALLEL
> + && !INSN_P (PREV_INSN (insn)))
> return NULL_RTX;
>
> condition = doloop_condition_get (insn);
> @@ -648,38 +651,46 @@ generate_prolog_epilog (partial_schedule
> int i;
> int last_stage = PS_STAGE_COUNT (ps) - 1;
> edge e;
> - rtx insn;
> -
> + rtx insn, last_sub_insn = NULL_RTX;
>
> /* Generate the prolog, inserting its insns on the loop-entry edge.
*/
> start_sequence ();
>
> if (count_reg && !count_init)
> - /* Generate a subtract instruction at the beginning of the prolog to
> - adjust the loop count by STAGE_COUNT. */
> - emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> - count_reg,
> - GEN_INT (-last_stage)));
> + {
> + rtx sub_reg = NULL_RTX;
> +
> + sub_reg = expand_simple_binop (GET_MODE (count_reg), MINUS,
> + count_reg, GEN_INT (last_stage),
> + count_reg, 1, OPTAB_DIRECT);
> + gcc_assert (REG_P (sub_reg));
> + if (REGNO (sub_reg) != REGNO (count_reg))
> + emit_move_insn (count_reg, sub_reg);
>
> + /* Save the last insn generated so we'd search for excessive
> + decrements below starting from it. */
> + last_sub_insn = get_insns ();
> + while (NEXT_INSN (last_sub_insn))
> + last_sub_insn = NEXT_INSN (last_sub_insn);
> + }
>
> for (i = 0; i < last_stage; i++)
> duplicate_insns_of_cycles (ps, 0, i, 1);
>
> - /* Remove the decrements of the count_reg from the prolog.
> - It has been already adjusted. */
> - insn = get_insns ();
> - if (count_reg && !count_init)
> - insn = NEXT_INSN (insn);
> - while (insn)
> - {
> - if (INSN_P (insn) && single_set (insn)
> - && rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> - insn = delete_insn (insn);
> - else
> - insn = NEXT_INSN (insn);
> - }
> -
> -
> + /* Remove the decrements of the count_reg from the prolog.
> + It has been already adjusted. */
> + insn = get_insns ();
> + if (count_reg && !count_init)
> + insn = NEXT_INSN (last_sub_insn);
> + while (insn)
> + {
> + if (INSN_P (insn) && single_set (insn)
> + && rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> + insn = delete_insn (insn);
> + else
> + insn = NEXT_INSN (insn);
> + }
> +
> /* Put the prolog on the entry edge. */
> e = loop_preheader_edge (loop);
> split_edge_and_insert (e, get_insns ());