This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug fortran/68600] Inlined MATMUL is too slow.
- From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 02 Dec 2015 05:48:17 +0000
- Subject: [Bug fortran/68600] Inlined MATMUL is too slow.
- Auto-submitted: auto-generated
- References: <bug-68600-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #8 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Created attachment 36887
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36887&action=edit
A faster version
I took the example code found in
http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ where the register based
vector computations are explicitly called via the SSE registers and converted
it to use the builtin gcc vector extensions. I had to experiment a little to
get some of the equivalent operations of the original code.
With only -O2 and march=native I am getting good results. I need to roll this
into the other test program yet to confirm the gflops are being computed
correctly. The diff value is comparing to the reference naive results to check
the computation is correct.
MY_MMult = [
Size: 40, Gflops: 1.828571e+00, Diff: 2.664535e-15
Size: 80, Gflops: 3.696751e+00, Diff: 7.105427e-15
Size: 120, Gflops: 4.051583e+00, Diff: 1.065814e-14
Size: 160, Gflops: 4.015686e+00, Diff: 1.421085e-14
Size: 200, Gflops: 4.029212e+00, Diff: 2.131628e-14
Size: 240, Gflops: 3.972414e+00, Diff: 2.486900e-14
Size: 280, Gflops: 3.881188e+00, Diff: 2.842171e-14
Size: 320, Gflops: 3.872371e+00, Diff: 3.552714e-14
Size: 360, Gflops: 3.887676e+00, Diff: 4.973799e-14
Size: 400, Gflops: 3.862052e+00, Diff: 4.973799e-14
Size: 440, Gflops: 3.886575e+00, Diff: 4.973799e-14
Size: 480, Gflops: 3.910124e+00, Diff: 6.039613e-14
Size: 520, Gflops: 3.863706e+00, Diff: 6.394885e-14
Size: 560, Gflops: 3.976947e+00, Diff: 6.750156e-14
Size: 600, Gflops: 4.002631e+00, Diff: 7.460699e-14
Size: 640, Gflops: 3.992507e+00, Diff: 8.171241e-14
Size: 680, Gflops: 3.964570e+00, Diff: 9.237056e-14
Size: 720, Gflops: 3.973661e+00, Diff: 1.101341e-13
Size: 760, Gflops: 3.982346e+00, Diff: 1.065814e-13
Size: 800, Gflops: 3.869291e+00, Diff: 8.881784e-14
Size: 840, Gflops: 3.936271e+00, Diff: 1.065814e-13
Size: 880, Gflops: 3.931259e+00, Diff: 1.030287e-13
Size: 920, Gflops: 3.912907e+00, Diff: 1.207923e-13
Size: 960, Gflops: 3.938391e+00, Diff: 1.278977e-13
Size: 1000, Gflops: 3.945754e+00, Diff: 1.421085e-13