This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [graphite] fix quality of generated code
- From: Richard Guenther <rguenther at suse dot de>
- To: Sebastian Pop <sebpop at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Daniel Berlin <dberlin at dberlin dot org>
- Date: Sat, 10 Jan 2009 23:41:23 +0100 (CET)
- Subject: Re: [graphite] fix quality of generated code
- References: <cb9d34b20901101436t20000c71h18ae523fc061ddc4@mail.gmail.com>
On Sat, 10 Jan 2009, Sebastian Pop wrote:
> Hi,
>
> While fixing PR38786, I remarked that the code generated (after the
> fix that is still under test) looks horrible: in the innermost loop we
> duplicate all the scalar code needed to address the array and the
> loads needed as well. Here is how the code looks like after gloog:
>
> bb_10 (preds = {bb_9 }, succs = {bb_11 })
> {
> <bb 10>:
> D.1718_123 = (int) graphiteIV.50_64;
> D.1719_124 = (long unsigned int) D.1718_123;
> D.1720_125 = D.1719_124 * 8;
> D.1721_126 = D.1633_30 + D.1720_125;
> # VUSE <SMT.13_53(D)> { SMT.13 }
> D.1722_127 = *D.1721_126;
> D.1723_128 = (int) graphiteIV.51_75;
> D.1724_129 = (long unsigned int) D.1723_128;
> D.1725_130 = D.1724_129 * 8;
> D.1726_131 = D.1722_127 + D.1725_130;
> # VUSE <SMT.14_54(D)> { SMT.14 }
> D.1727_132 = *D.1726_131;
> D.1708_113 = (int) graphiteIV.50_64;
> D.1709_114 = (long unsigned int) D.1708_113;
> D.1710_115 = D.1709_114 * 8;
> D.1711_116 = D.1618_13 + D.1710_115;
> # VUSE <SMT.13_53(D)> { SMT.13 }
> D.1712_117 = *D.1711_116;
> D.1713_118 = (int) graphiteIV.51_75;
> D.1714_119 = (long unsigned int) D.1713_118;
> D.1715_120 = D.1714_119 * 8;
> D.1716_121 = D.1712_117 + D.1715_120;
> # VUSE <SMT.14_54(D)> { SMT.14 }
> D.1717_122 = *D.1716_121;
> ivtmp.44_105 = graphiteIV.53_99;
> l_106 = (int) ivtmp.44_105;
> D.1627_107 = (long unsigned int) l_106;
> D.1628_108 = D.1627_107 * 4;
> D.1629_109 = D.1717_122 + D.1628_108;
> D.1638_110 = D.1727_132 + D.1628_108;
> # VUSE <SMT.15_136> { SMT.15 }
> D.1639_111 = *D.1638_110;
> # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
> *D.1629_109 = D.1639_111;
> l_112 = l_106 + 1;
>
> }
>
> Now after scheduling some scalar cleanups, see the attached patch, I
> get a better looking code, but still the loads with invariant accesses
> were not moved out of the loop by LIM. I tried to schedule a PRE but
> that's failing. Here is the code after the second LIM:
>
> bb_10 (preds = {bb_9 }, succs = {bb_11 })
> {
> <bb 10>:
> # VUSE <SMT.13_53(D)> { SMT.13 }
> D.1722_127 = *D.1721_126;
> D.1726_131 = D.1722_127 + D.1725_130;
> # VUSE <SMT.14_54(D)> { SMT.14 }
> D.1727_132 = *D.1726_131;
> # VUSE <SMT.13_53(D)> { SMT.13 }
> D.1712_117 = *D.1711_116;
> D.1716_121 = D.1712_117 + D.1715_120;
> # VUSE <SMT.14_54(D)> { SMT.14 }
> D.1717_122 = *D.1716_121;
> l_106 = (int) graphiteIV.53_99;
> D.1627_107 = (long unsigned int) l_106;
> D.1628_108 = D.1627_107 * 4;
> D.1629_109 = D.1717_122 + D.1628_108;
> D.1638_110 = D.1727_132 + D.1628_108;
> # VUSE <SMT.15_136> { SMT.15 }
> D.1639_111 = *D.1638_110;
> # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
> *D.1629_109 = D.1639_111;
>
> }
>
> Here is what I would expect to see after a more aggressive LIM:
>
> bb_10 (preds = {bb_9 }, succs = {bb_11 })
> {
> <bb 10>:
> l_106 = (int) graphiteIV.53_99;
> D.1627_107 = (long unsigned int) l_106;
> D.1628_108 = D.1627_107 * 4;
> D.1629_109 = D.1717_122 + D.1628_108;
> D.1638_110 = D.1727_132 + D.1628_108;
> # VUSE <SMT.15_136> { SMT.15 }
> D.1639_111 = *D.1638_110;
> # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
> *D.1629_109 = D.1639_111;
>
> }
>
> Do you have suggestions of other scalar cleanup passes that could be
> run after graphite?
On the trunk you probably need to recompute alias information. I would
suggest you delay working on this until after the alias-improvements
branch merge (or try there).
Richard.