This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [graphite] fix quality of generated code

From: Richard Guenther <rguenther at suse dot de>
To: Sebastian Pop <sebpop at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Daniel Berlin <dberlin at dberlin dot org>
Date: Sat, 10 Jan 2009 23:41:23 +0100 (CET)
Subject: Re: [graphite] fix quality of generated code
References: <cb9d34b20901101436t20000c71h18ae523fc061ddc4@mail.gmail.com>

On Sat, 10 Jan 2009, Sebastian Pop wrote:

> Hi,
> 
> While fixing PR38786, I remarked that the code generated (after the
> fix that is still under test) looks horrible: in the innermost loop we
> duplicate all the scalar code needed to address the array and the
> loads needed as well.  Here is how the code looks like after gloog:
> 
>           bb_10 (preds = {bb_9 }, succs = {bb_11 })
>           {
>           <bb 10>:
>             D.1718_123 = (int) graphiteIV.50_64;
>             D.1719_124 = (long unsigned int) D.1718_123;
>             D.1720_125 = D.1719_124 * 8;
>             D.1721_126 = D.1633_30 + D.1720_125;
>             # VUSE <SMT.13_53(D)> { SMT.13 }
>             D.1722_127 = *D.1721_126;
>             D.1723_128 = (int) graphiteIV.51_75;
>             D.1724_129 = (long unsigned int) D.1723_128;
>             D.1725_130 = D.1724_129 * 8;
>             D.1726_131 = D.1722_127 + D.1725_130;
>             # VUSE <SMT.14_54(D)> { SMT.14 }
>             D.1727_132 = *D.1726_131;
>             D.1708_113 = (int) graphiteIV.50_64;
>             D.1709_114 = (long unsigned int) D.1708_113;
>             D.1710_115 = D.1709_114 * 8;
>             D.1711_116 = D.1618_13 + D.1710_115;
>             # VUSE <SMT.13_53(D)> { SMT.13 }
>             D.1712_117 = *D.1711_116;
>             D.1713_118 = (int) graphiteIV.51_75;
>             D.1714_119 = (long unsigned int) D.1713_118;
>             D.1715_120 = D.1714_119 * 8;
>             D.1716_121 = D.1712_117 + D.1715_120;
>             # VUSE <SMT.14_54(D)> { SMT.14 }
>             D.1717_122 = *D.1716_121;
>             ivtmp.44_105 = graphiteIV.53_99;
>             l_106 = (int) ivtmp.44_105;
>             D.1627_107 = (long unsigned int) l_106;
>             D.1628_108 = D.1627_107 * 4;
>             D.1629_109 = D.1717_122 + D.1628_108;
>             D.1638_110 = D.1727_132 + D.1628_108;
>             # VUSE <SMT.15_136> { SMT.15 }
>             D.1639_111 = *D.1638_110;
>             # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>             *D.1629_109 = D.1639_111;
>             l_112 = l_106 + 1;
> 
>           }
> 
> Now after scheduling some scalar cleanups, see the attached patch, I
> get a better looking code, but still the loads with invariant accesses
> were not moved out of the loop by LIM.  I tried to schedule a PRE but
> that's failing.  Here is the code after the second LIM:
> 
>           bb_10 (preds = {bb_9 }, succs = {bb_11 })
>           {
>           <bb 10>:
>             # VUSE <SMT.13_53(D)> { SMT.13 }
>             D.1722_127 = *D.1721_126;
>             D.1726_131 = D.1722_127 + D.1725_130;
>             # VUSE <SMT.14_54(D)> { SMT.14 }
>             D.1727_132 = *D.1726_131;
>             # VUSE <SMT.13_53(D)> { SMT.13 }
>             D.1712_117 = *D.1711_116;
>             D.1716_121 = D.1712_117 + D.1715_120;
>             # VUSE <SMT.14_54(D)> { SMT.14 }
>             D.1717_122 = *D.1716_121;
>             l_106 = (int) graphiteIV.53_99;
>             D.1627_107 = (long unsigned int) l_106;
>             D.1628_108 = D.1627_107 * 4;
>             D.1629_109 = D.1717_122 + D.1628_108;
>             D.1638_110 = D.1727_132 + D.1628_108;
>             # VUSE <SMT.15_136> { SMT.15 }
>             D.1639_111 = *D.1638_110;
>             # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>             *D.1629_109 = D.1639_111;
> 
>           }
> 
> Here is what I would expect to see after a more aggressive LIM:
> 
>           bb_10 (preds = {bb_9 }, succs = {bb_11 })
>           {
>           <bb 10>:
>             l_106 = (int) graphiteIV.53_99;
>             D.1627_107 = (long unsigned int) l_106;
>             D.1628_108 = D.1627_107 * 4;
>             D.1629_109 = D.1717_122 + D.1628_108;
>             D.1638_110 = D.1727_132 + D.1628_108;
>             # VUSE <SMT.15_136> { SMT.15 }
>             D.1639_111 = *D.1638_110;
>             # SMT.15_137 = VDEF <SMT.15_136> { SMT.15 }
>             *D.1629_109 = D.1639_111;
> 
>           }
> 
> Do you have suggestions of other scalar cleanup passes that could be
> run after graphite?

On the trunk you probably need to recompute alias information.  I would
suggest you delay working on this until after the alias-improvements
branch merge (or try there).

Richard.

Follow-Ups:
- Re: [graphite] fix quality of generated code
  - From: Sebastian Pop

References:
- [graphite] fix quality of generated code
  - From: Sebastian Pop

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]