[Bug tree-optimization/14741] graphite with loop blocking and interchanging doesn't optimize a matrix multiplication loop

Mon May 18 06:28:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741

--- Comment #32 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> ---
(In reply to Thomas Koenig from comment #31)
> If the middle end is not up to this, should we be looking at doing loop
> blocking in the Fortran front end, at least for the Matmul intrinsic?

I think this makes sense, fixing this issue in the middle end seems to be a
project on a different timescale. Ideally, matmul expands to something that
generates good code even at e.g. -O2 -march=native (which would require both
blocking and unrolling). At that point, the inlined code would be faster than
the runtime library...for all sizes.