This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[graphite] fix quality of generated code
- From: "Sebastian Pop" <sebpop at gmail dot com>
- To: "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Richard Guenther" <rguenther at suse dot de>, "Daniel Berlin" <dberlin at dberlin dot org>
- Date: Sat, 10 Jan 2009 16:36:32 -0600
- Subject: [graphite] fix quality of generated code
Hi,
While fixing PR38786, I remarked that the code generated (after the
fix that is still under test) looks horrible: in the innermost loop we
duplicate all the scalar code needed to address the array and the
loads needed as well. Here is how the code looks like after gloog:
bb_10 (preds = {bb_9 }, succs = {bb_11 })
{
<bb 10>:
D.1718_123 = (int) graphiteIV.50_64;
D.1719_124 = (long unsigned int) D.1718_123;
D.1720_125 = D.1719_124 * 8;
D.1721_126 = D.1633_30 + D.1720_125;
# VUSE <SMT.13_53(D)> { SMT.13 }
D.1722_127 = *D.1721_126;
D.1723_128 = (int) graphiteIV.51_75;
D.1724_129 = (long unsigned int) D.1723_128;
D.1725_130 = D.1724_129 * 8;
D.1726_131 = D.1722_127 + D.1725_130;
# VUSE <SMT.14_54(D)> { SMT.14 }
D.1727_132 = *D.1726_131;
D.1708_113 = (int) graphiteIV.50_64;
D.1709_114 = (long unsigned int) D.1708_113;
D.1710_115 = D.1709_114 * 8;
D.1711_116 = D.1618_13 + D.1710_115;
# VUSE <SMT.13_53(D)> { SMT.13 }
D.1712_117 = *D.1711_116;
D.1713_118 = (int) graphiteIV.51_75;
D.1714_119 = (long unsigned int) D.1713_118;
D.1715_120 = D.1714_119 * 8;
D.1716_121 = D.1712_117 + D.1715_120;
# VUSE <SMT.14_54(D)> { SMT.14 }
D.1717_122 = *D.1716_121;
ivtmp.44_105 = graphiteIV.53_99;
l_106 = (int) ivtmp.44_105;
D.1627_107 = (long unsigned int) l_106;
D.1628_108 = D.1627_107 * 4;
D.1629_109 = D.1717_122 + D.1628_108;
D.1638_110 = D.1727_132 + D.1628_108;
# VUSE <SMT.15_136> { SMT.15 }
D.1639_111 = *D.1638_110;
# SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
*D.1629_109 = D.1639_111;
l_112 = l_106 + 1;
}
Now after scheduling some scalar cleanups, see the attached patch, I
get a better looking code, but still the loads with invariant accesses
were not moved out of the loop by LIM. I tried to schedule a PRE but
that's failing. Here is the code after the second LIM:
bb_10 (preds = {bb_9 }, succs = {bb_11 })
{
<bb 10>:
# VUSE <SMT.13_53(D)> { SMT.13 }
D.1722_127 = *D.1721_126;
D.1726_131 = D.1722_127 + D.1725_130;
# VUSE <SMT.14_54(D)> { SMT.14 }
D.1727_132 = *D.1726_131;
# VUSE <SMT.13_53(D)> { SMT.13 }
D.1712_117 = *D.1711_116;
D.1716_121 = D.1712_117 + D.1715_120;
# VUSE <SMT.14_54(D)> { SMT.14 }
D.1717_122 = *D.1716_121;
l_106 = (int) graphiteIV.53_99;
D.1627_107 = (long unsigned int) l_106;
D.1628_108 = D.1627_107 * 4;
D.1629_109 = D.1717_122 + D.1628_108;
D.1638_110 = D.1727_132 + D.1628_108;
# VUSE <SMT.15_136> { SMT.15 }
D.1639_111 = *D.1638_110;
# SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
*D.1629_109 = D.1639_111;
}
Here is what I would expect to see after a more aggressive LIM:
bb_10 (preds = {bb_9 }, succs = {bb_11 })
{
<bb 10>:
l_106 = (int) graphiteIV.53_99;
D.1627_107 = (long unsigned int) l_106;
D.1628_108 = D.1627_107 * 4;
D.1629_109 = D.1717_122 + D.1628_108;
D.1638_110 = D.1727_132 + D.1628_108;
# VUSE <SMT.15_136> { SMT.15 }
D.1639_111 = *D.1638_110;
# SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
*D.1629_109 = D.1639_111;
}
Do you have suggestions of other scalar cleanup passes that could be
run after graphite?
Thanks,
Sebastian Pop
--
AMD - GNU Tools
Index: passes.c
===================================================================
--- passes.c (revision 143190)
+++ passes.c (working copy)
@@ -657,6 +657,12 @@ init_optimization_passes (void)
NEXT_PASS (pass_loop_distribution);
NEXT_PASS (pass_linear_transform);
NEXT_PASS (pass_graphite_transforms);
+ {
+ struct opt_pass **p = &pass_graphite_transforms.pass.sub;
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_dce_loop);
+ NEXT_PASS (pass_lim);
+ }
NEXT_PASS (pass_iv_canon);
NEXT_PASS (pass_if_conversion);
NEXT_PASS (pass_vectorize);