This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [graphite] fix quality of generated code

From: "Daniel Berlin" <dberlin at dberlin dot org>
To: "Sebastian Pop" <sebpop at gmail dot com>
Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Richard Guenther" <rguenther at suse dot de>
Date: Sat, 10 Jan 2009 18:44:17 -0500
Subject: Re: [graphite] fix quality of generated code
References: <cb9d34b20901101436t20000c71h18ae523fc061ddc4@mail.gmail.com>

How is PRE failing when you try to schedule it?

On Sat, Jan 10, 2009 at 5:36 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> While fixing PR38786, I remarked that the code generated (after the
> fix that is still under test) looks horrible: in the innermost loop we
> duplicate all the scalar code needed to address the array and the
> loads needed as well.  Here is how the code looks like after gloog:
>
>          bb_10 (preds = {bb_9 }, succs = {bb_11 })
>          {
>          <bb 10>:
>            D.1718_123 = (int) graphiteIV.50_64;
>            D.1719_124 = (long unsigned int) D.1718_123;
>            D.1720_125 = D.1719_124 * 8;
>            D.1721_126 = D.1633_30 + D.1720_125;
>            # VUSE <SMT.13_53(D)> { SMT.13 }
>            D.1722_127 = *D.1721_126;
>            D.1723_128 = (int) graphiteIV.51_75;
>            D.1724_129 = (long unsigned int) D.1723_128;
>            D.1725_130 = D.1724_129 * 8;
>            D.1726_131 = D.1722_127 + D.1725_130;
>            # VUSE <SMT.14_54(D)> { SMT.14 }
>            D.1727_132 = *D.1726_131;
>            D.1708_113 = (int) graphiteIV.50_64;
>            D.1709_114 = (long unsigned int) D.1708_113;
>            D.1710_115 = D.1709_114 * 8;
>            D.1711_116 = D.1618_13 + D.1710_115;
>            # VUSE <SMT.13_53(D)> { SMT.13 }
>            D.1712_117 = *D.1711_116;
>            D.1713_118 = (int) graphiteIV.51_75;
>            D.1714_119 = (long unsigned int) D.1713_118;
>            D.1715_120 = D.1714_119 * 8;
>            D.1716_121 = D.1712_117 + D.1715_120;
>            # VUSE <SMT.14_54(D)> { SMT.14 }
>            D.1717_122 = *D.1716_121;
>            ivtmp.44_105 = graphiteIV.53_99;
>            l_106 = (int) ivtmp.44_105;
>            D.1627_107 = (long unsigned int) l_106;
>            D.1628_108 = D.1627_107 * 4;
>            D.1629_109 = D.1717_122 + D.1628_108;
>            D.1638_110 = D.1727_132 + D.1628_108;
>            # VUSE <SMT.15_136> { SMT.15 }
>            D.1639_111 = *D.1638_110;
>            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>            *D.1629_109 = D.1639_111;
>            l_112 = l_106 + 1;
>
>          }
>
> Now after scheduling some scalar cleanups, see the attached patch, I
> get a better looking code, but still the loads with invariant accesses
> were not moved out of the loop by LIM.  I tried to schedule a PRE but
> that's failing.  Here is the code after the second LIM:
>
>          bb_10 (preds = {bb_9 }, succs = {bb_11 })
>          {
>          <bb 10>:
>            # VUSE <SMT.13_53(D)> { SMT.13 }
>            D.1722_127 = *D.1721_126;
>            D.1726_131 = D.1722_127 + D.1725_130;
>            # VUSE <SMT.14_54(D)> { SMT.14 }
>            D.1727_132 = *D.1726_131;
>            # VUSE <SMT.13_53(D)> { SMT.13 }
>            D.1712_117 = *D.1711_116;
>            D.1716_121 = D.1712_117 + D.1715_120;
>            # VUSE <SMT.14_54(D)> { SMT.14 }
>            D.1717_122 = *D.1716_121;
>            l_106 = (int) graphiteIV.53_99;
>            D.1627_107 = (long unsigned int) l_106;
>            D.1628_108 = D.1627_107 * 4;
>            D.1629_109 = D.1717_122 + D.1628_108;
>            D.1638_110 = D.1727_132 + D.1628_108;
>            # VUSE <SMT.15_136> { SMT.15 }
>            D.1639_111 = *D.1638_110;
>            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>            *D.1629_109 = D.1639_111;
>
>          }
>
> Here is what I would expect to see after a more aggressive LIM:
>
>          bb_10 (preds = {bb_9 }, succs = {bb_11 })
>          {
>          <bb 10>:
>            l_106 = (int) graphiteIV.53_99;
>            D.1627_107 = (long unsigned int) l_106;
>            D.1628_108 = D.1627_107 * 4;
>            D.1629_109 = D.1717_122 + D.1628_108;
>            D.1638_110 = D.1727_132 + D.1628_108;
>            # VUSE <SMT.15_136> { SMT.15 }
>            D.1639_111 = *D.1638_110;
>            # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>            *D.1629_109 = D.1639_111;
>
>          }
>
> Do you have suggestions of other scalar cleanup passes that could be
> run after graphite?
>
> Thanks,
> Sebastian Pop
> --
> AMD - GNU Tools
>

Follow-Ups:
- Re: [graphite] fix quality of generated code
  - From: Sebastian Pop

References:
- [graphite] fix quality of generated code
  - From: Sebastian Pop

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]