This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/14741] graphite with loop blocking and interchanging doesn't optimize a matrix multiplication loop
- From: "spop at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 14 Jul 2013 07:09:47 +0000
- Subject: [Bug tree-optimization/14741] graphite with loop blocking and interchanging doesn't optimize a matrix multiplication loop
- Auto-submitted: auto-generated
- References: <bug-14741-4 at http dot gcc dot gnu dot org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741
--- Comment #18 from Sebastian Pop <spop at gcc dot gnu.org> ---
On my laptop ARM Exynos5 at 1.6GHz I get this:
gfortran -ffast-math -O3 t.f90
./a.out
192.75000000000000 10.239999999999826
gfortran -ffast-math -O3 -fgraphite -floop-interchange -floop-block t.f90
./a.out
193.77500000000001 10.239999999999826
gfortran -ffast-math -O3 -floop-nest-optimize t.f90
t.f90: In function âMAIN__â:
t.f90:5:0: warning: iteration 31 invokes undefined behavior
[-Waggressive-loop-optimizations]
B=0.1D0
^
f951: note: containing loop
t.f90:4:0: warning: iteration 31 invokes undefined behavior
[-Waggressive-loop-optimizations]
A=0.1D0
^
f951: note: containing loop
./a.out
./a.out: No such file or directory
I don't know why the compiler does not produce an executable: -S does produce a
.s file.
Adding -fdump-tree-graphite-all produces a file t.f90.106t.graphite containing
the information about what graphite has done: I see that we do loop block the
loop nest like this:
gfortran -ffast-math -O3 -floop-nest-optimize -fdump-tree-graphite-all t.f90
CLAST generated by CLooG:
for (scat_0=0;scat_0<=1023;scat_0+=32) {
for (scat_1=0;scat_1<=1023;scat_1+=32) {
for (scat_2=scat_0;scat_2<=scat_0+31;scat_2++) {
for (scat_3=scat_1;scat_3<=scat_1+31;scat_3++) {
(scat_2,scat_3);
}
}
}
}
I see that the tile size is hard coded as a constant in graphite-optimize-isl.c
TileMap = getTileMap(ctx, *Dimensions, 32);
that should be replaced by a param and tuned.