This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[graphite] fix quality of generated code

From: "Sebastian Pop" <sebpop at gmail dot com>
To: "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Richard Guenther" <rguenther at suse dot de>, "Daniel Berlin" <dberlin at dberlin dot org>
Date: Sat, 10 Jan 2009 16:36:32 -0600
Subject: [graphite] fix quality of generated code

Hi,

While fixing PR38786, I remarked that the code generated (after the
fix that is still under test) looks horrible: in the innermost loop we
duplicate all the scalar code needed to address the array and the
loads needed as well.  Here is how the code looks like after gloog:

          bb_10 (preds = {bb_9 }, succs = {bb_11 })
          {
          <bb 10>:
            D.1718_123 = (int) graphiteIV.50_64;
            D.1719_124 = (long unsigned int) D.1718_123;
            D.1720_125 = D.1719_124 * 8;
            D.1721_126 = D.1633_30 + D.1720_125;
            # VUSE <SMT.13_53(D)> { SMT.13 }
            D.1722_127 = *D.1721_126;
            D.1723_128 = (int) graphiteIV.51_75;
            D.1724_129 = (long unsigned int) D.1723_128;
            D.1725_130 = D.1724_129 * 8;
            D.1726_131 = D.1722_127 + D.1725_130;
            # VUSE <SMT.14_54(D)> { SMT.14 }
            D.1727_132 = *D.1726_131;
            D.1708_113 = (int) graphiteIV.50_64;
            D.1709_114 = (long unsigned int) D.1708_113;
            D.1710_115 = D.1709_114 * 8;
            D.1711_116 = D.1618_13 + D.1710_115;
            # VUSE <SMT.13_53(D)> { SMT.13 }
            D.1712_117 = *D.1711_116;
            D.1713_118 = (int) graphiteIV.51_75;
            D.1714_119 = (long unsigned int) D.1713_118;
            D.1715_120 = D.1714_119 * 8;
            D.1716_121 = D.1712_117 + D.1715_120;
            # VUSE <SMT.14_54(D)> { SMT.14 }
            D.1717_122 = *D.1716_121;
            ivtmp.44_105 = graphiteIV.53_99;
            l_106 = (int) ivtmp.44_105;
            D.1627_107 = (long unsigned int) l_106;
            D.1628_108 = D.1627_107 * 4;
            D.1629_109 = D.1717_122 + D.1628_108;
            D.1638_110 = D.1727_132 + D.1628_108;
            # VUSE <SMT.15_136> { SMT.15 }
            D.1639_111 = *D.1638_110;
            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
            *D.1629_109 = D.1639_111;
            l_112 = l_106 + 1;

          }

Now after scheduling some scalar cleanups, see the attached patch, I
get a better looking code, but still the loads with invariant accesses
were not moved out of the loop by LIM.  I tried to schedule a PRE but
that's failing.  Here is the code after the second LIM:

          bb_10 (preds = {bb_9 }, succs = {bb_11 })
          {
          <bb 10>:
            # VUSE <SMT.13_53(D)> { SMT.13 }
            D.1722_127 = *D.1721_126;
            D.1726_131 = D.1722_127 + D.1725_130;
            # VUSE <SMT.14_54(D)> { SMT.14 }
            D.1727_132 = *D.1726_131;
            # VUSE <SMT.13_53(D)> { SMT.13 }
            D.1712_117 = *D.1711_116;
            D.1716_121 = D.1712_117 + D.1715_120;
            # VUSE <SMT.14_54(D)> { SMT.14 }
            D.1717_122 = *D.1716_121;
            l_106 = (int) graphiteIV.53_99;
            D.1627_107 = (long unsigned int) l_106;
            D.1628_108 = D.1627_107 * 4;
            D.1629_109 = D.1717_122 + D.1628_108;
            D.1638_110 = D.1727_132 + D.1628_108;
            # VUSE <SMT.15_136> { SMT.15 }
            D.1639_111 = *D.1638_110;
            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
            *D.1629_109 = D.1639_111;

          }

Here is what I would expect to see after a more aggressive LIM:

          bb_10 (preds = {bb_9 }, succs = {bb_11 })
          {
          <bb 10>:
            l_106 = (int) graphiteIV.53_99;
            D.1627_107 = (long unsigned int) l_106;
            D.1628_108 = D.1627_107 * 4;
            D.1629_109 = D.1717_122 + D.1628_108;
            D.1638_110 = D.1727_132 + D.1628_108;
            # VUSE <SMT.15_136> { SMT.15 }
            D.1639_111 = *D.1638_110;
            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
            *D.1629_109 = D.1639_111;

          }

Do you have suggestions of other scalar cleanup passes that could be
run after graphite?

Thanks,
Sebastian Pop
--
AMD - GNU Tools

Index: passes.c
===================================================================
--- passes.c	(revision 143190)
+++ passes.c	(working copy)
@@ -657,6 +657,12 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_loop_distribution);
 	  NEXT_PASS (pass_linear_transform);
 	  NEXT_PASS (pass_graphite_transforms);
+	    {
+	      struct opt_pass **p = &pass_graphite_transforms.pass.sub;
+	      NEXT_PASS (pass_copy_prop);
+	      NEXT_PASS (pass_dce_loop);
+	      NEXT_PASS (pass_lim);
+	    }
 	  NEXT_PASS (pass_iv_canon);
 	  NEXT_PASS (pass_if_conversion);
 	  NEXT_PASS (pass_vectorize);

Follow-Ups:
- Re: [graphite] fix quality of generated code
  - From: Richard Guenther
- Re: [graphite] fix quality of generated code
  - From: Daniel Berlin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]