This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
Here is an initial version of the modulo scheduler
(following the previous discussion on the DDG raised in
http://gcc.gnu.org/ml/gcc/2003-12/msg00631.html).
This initial implementation is a first attempt to go through
all the steps of modulo-scheduling simple loops; there are many
Todo's and various parameters to tune. Here are a couple of the
more challenging items.
1. Building inter-loop (memory) dependences
-------------------------------------------
Dan wrote (in http://gcc.gnu.org/ml/gcc/2003-12/msg00685.html):
>I *strongly* recommend against reviving dependence.c, for the following
reasons:
[snip]
We thought we could build inter-loop dependences by simply duplicating the
loop
body and calling "sched_analyze (&tmp_deps, head, tail);". We were wrong.
If we
have
(1) t1 = a[i]
(2) t2 = a[i-1]
(3) a[i] = t1 + t2
(4) i++
and duplicate it:
(1') t1 = a[i]
(2') t2 = a[i-1]
(3') a[i] = t1 + t2
(4') i++
we do not see that (2') is directly dependent on (3); there is a transitive
dependence through (4). Trying to relax the anti-dependence of (4) on (3)
(to get more parallelism) disconnects (2') from (3) completely.
Renaming i++ into j = i+1 (in 4, 1', 2', 3', which is what we currently do)
results in having both (1') and (2') directly dependent on (3).
We could try to:
1. Fold the "i+1" into uses of j after/during renaming, to produce
(1') t1 = a[i+1] and (2') t2 = a[i]. Hopefully, sched_analyze will then
work
appropriately.
2. Perform dependence analysis in RTL, that calculates the distance between
a
given pair of memory-references (inside a loop).
3. Pass dependence info from tree-ssa (monev analyzer?), and maintain this
info
until reaching the modulo-scheduler (primary concern: loop unroller),
following Dan's suggestion:
>If you must have better tree->rtl dependence info passed down (which is
what dependence.c
>did), i'd suggest starting with Sebastian Pop's recently posted work, and
having the
>results passed down to the RTL level somewhat like dependence.c does.
>
>Though starting with some simple RTL dependence and then using more
advanced tree
>dependence info passed down to RTL when tree-ssa arrives sounds like a
good option.
2. Loop analysis
----------------
Our initial implementation was intentionally independent of loop analysis
information. We currently check explicitely for single basic-block loops
with an explicit doloop branch at the end and constant trip count. It would
be
better to (re)use iv/doloop/unrolling analysis (especially for dependence
analysis
((2) above)), if they will (continue to) be available for use by the modulo
scheduler in RTL, and not disappear in tree-ssa.
3. Subsequent rescheduling
--------------------------
Currently SMS is invoked immediately prior to the first scheduling pass.
The loops
that undergo modulo scheduling are then rescheduled by schedule_insns (and
then
again after reload). Note that SMS introduces register-move instructions to
cope
with modulo-variable-expansion, and these moves are not scheduled too
carefully by
SMS.
---
Vlad wrote (in http://gcc.gnu.org/ml/gcc/2003-12/msg00659.html):
>Another thing is to output ddg for vcg (personally I prefer graphiz but
>is distributed not under GPL therefore we does not support it). My
>experience show it will be a critical component for software pipelining
>debugging.
Done.
For a simple example,
float dot (float* a, float* b, int n)
{
int i;
float result = 0;
for (i=0; i<7; i++)
result += a[i]*b[i];
return result;
}
is modulo-scheduled (on powerpc-apple-darwin6.4 with -O3) into:
_dot:
mflr r6
stw r31,-8(r1)
li r2,0
li r0,6 /* adjusting loop count */
bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
mtctr r0
slwi r5,r2,2
addi r2,r2,1
mflr r31
stw r6,16(r1)
lfsx f13,r5,r3 /* prolog */
lfsx f0,r5,r4 /* prolog */
addis r9,r31,ha16(LC0-"L00000000001$pb")
lfs f1,lo16(LC0-"L00000000001$pb")(r9)
L8:
slwi r7,r2,2
fmadds f1,f13,f0,f1
addi r2,r2,1
lfsx f13,r7,r3
lfsx f0,r7,r4
bdnz L8
lwz r3,16(r1)
fmadds f1,f13,f0,f1 /* epilog */
lwz r31,-8(r1)
mtlr r3
blr
[Patches relative to gcc 3.5.0 20040127]
(See attached file: modulo-sched.c)
(See attached file: modulo-order.c)
(See attached file: modulo-ps.h)
(See attached file: modulo-ps.c)
(See attached file: ddg.h)
(See attached file: ddg.c)
(See attached file: existing-files.diff)
Ayal and Mustafa.Attachment:
modulo-sched.c
Description: Binary data
Attachment:
modulo-order.c
Description: Binary data
Attachment:
modulo-ps.h
Description: Binary data
Attachment:
modulo-ps.c
Description: Binary data
Attachment:
ddg.h
Description: Binary data
Attachment:
ddg.c
Description: Binary data
Attachment:
existing-files.diff
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |