[Bug tree-optimization/37916] [4.2/4.3/4.4 Regression] SSA names causing register pressure; unnecessarily many simultaneously "live" names.

amacleod at redhat dot com gcc-bugzilla@gcc.gnu.org
Mon Oct 27 16:23:00 GMT 2008



------- Comment #14 from amacleod at redhat dot com  2008-10-27 16:21 -------
TER's job is to create larger expressions for the expander so that we get
better instruction selection during the initial expansion from trees/tuples to
RTL.

It does this by simply expanding the definition of an ssa-name into its use
location.  This is only done if the definition has a single use, otherwise you
would be executing the definition code more than once, which is generally
undesirable.

The code in this example has a string of about 14 serial adds, followed by 14
related adds.

 s1.155 = s1.153 + (long unsigned int) MEM[base: buf.183, offset: 1]{*D.1237};
 s1.157 = s1.155 + (long unsigned int) MEM[base: buf.183, offset: 2]{*D.1240};
 s1.159 = s1.157 + (long unsigned int) MEM[base: buf.183, offset: 3]{*D.1243};
 s1.161 = s1.159 + (long unsigned int) MEM[base: buf.183, offset: 4]{*D.1246};
<...>
 s2.156 = s2.154 + s1.155;
 s2.158 = s2.156 + s1.157;
 s2.160 = s2.158 + s1.159;
 s2.162 = s2.160 + s1.161;

Since s1.155 is used in 2 different places, it eliminates TER from doing
anything with it.

A register pressure reduction pass could alleviate this problem, either early
near RTL expansion time or as part of the register allocator spilling
subsystem. Both have been talked about, but I don't believe either has been
worked on to any great degree.

Scheduling could help as well if it would see fit to start interleaving some of
those adds:

Since the addition of s1.157 has to wait for s1.155 to finish, and then s1.159
has to wait for s1.157, s2.156 is ready to execute and could be interleaved
between s1.157 and s1.159 while waiting for s1.157 to finish (which since it
has to go to memory one would expect might be delayed).
ie:
 s1.155 = s1.153 + (long unsigned int) MEM[base: buf.183, offset: 1]{*D.1237};
 s1.157 = s1.155 + (long unsigned int) MEM[base: buf.183, offset: 2]{*D.1240};
 s2.156 = s2.154 + s1.155;
 s1.159 = s1.157 + (long unsigned int) MEM[base: buf.183, offset: 3]{*D.1243};
 s2.158 = s2.156 + s1.157;
 s1.161 = s1.159 + (long unsigned int) MEM[base: buf.183, offset: 4]{*D.1246};
 s2.160 = s2.158 + s1.159;

which would, as a convenient side effect, solve the problem.


-- 

amacleod at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amacleod at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37916



More information about the Gcc-bugs mailing list