[wwwdocs] Small update for projects/x86.html

Thu Feb 11 21:40:00 GMT 2010

Hello,

This is a small update for a six-year-old page. I wanted to see if any
of the problems mentioned on the page have been fixed now (sadly: no).
OK for wwwdocs?

Ciao!
Steven

	* htdocs/projects/x86.html: Reconfirm a few problems and update
	example assembler output where necessary.

Index: x86.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/x86.html,v
retrieving revision 1.1
diff -u -r1.1 x86.html

--- x86.html	30 Jan 2005 12:22:14 -0000	1.1
+++ x86.html	11 Feb 2010 21:37:52 -0000
@@ -32,8 +32,9 @@
 <hr />
 <h2><a name="csefail">Failure of common subexpression elimination</a></h2>

-<p>(12 Nov 2004) Common subexpression elimination cannot merge
-calculations that take place in different machine modes.  Consider</p>
+<p>(12 Nov 2004, reconfirmed with trunk revision 156706) Common
+subexpression elimination cannot merge calculations that take
+place in different machine modes.  Consider</p>

 <pre>
 static const unsigned char
@@ -59,61 +60,43 @@
 <pre>
 map:
         movzbl  4(%esp), %edx
-        xorl    %ecx, %ecx
-        movb    %dl, %al
-        subb    $33, %al
-        cmpb    $29, %al
-        ja      .L1
-        movzbl  %dl, %eax
-        movzbl  trigraph_map-33(%eax), %ecx
-.L1:
-        movl    %ecx, %eax
+        xorl    %eax, %eax
+        movb    %dl, %cl
+        subb    $33, %cl
+        cmpb    $29, %cl
+        ja      .L2
+        movzbl  %dl, %edx
+        movzbl  trigraph_map-33(%edx), %eax
+.L2:
         ret
 </pre>

-<p>Notice how we subtract 33 from <code>%al</code>, throw that value
+<p>Notice how we subtract 33 from <code>%cl</code>, throw that value
 away, reload <code>%eax</code> with the original value, and then
 subtract 33 again (with a linker relocation; the processor does not
 do the subtraction twice).</p>

-<p>It would be just as easy to extend the value in <code>%al</code>
-and use it directly.  (<code>%al</code> is the bottom eight bits of
-<code>%eax</code>, so you might think it wasn't even necessary to do
+<p>It would be just as easy to extend the value in <code>%cl</code>
+and use it directly.  (<code>%cl</code> is the bottom eight bits of
+<code>%ecx</code>, so you might think it wasn't even necessary to do
 the extend.  However, modern x86 processors treat them as separate
 registers unless forced, which costs a pipeline stall.)  That might
 look something like this:</p>

 <pre>
 map:
-	movzbl	4(%esp), %eax
-	xorl	%ecx, %ecx
-	subl	$33, %eax
-	cmpl	$29, %eax
-	ja	.L1
-	movzbl	trigraph_map(%eax), %ecx
-.L1:
-	movl	%ecx, %eax
-	ret
-</pre>
-
-<p>This saves a register as well as a couple of move instructions.  If
-this routine were to be inlined, that would become important.  We
-still have unnecessary moves in this version: simply by interchanging
-<code>%ecx</code> and <code>%eax</code> throughout, we can get rid of
-the final move.</p>
-
-<pre>
-map:
 	movzbl	4(%esp), %ecx
 	xorl	%eax, %eax
 	subl	$33, %ecx
 	cmpl	$29, %ecx
 	ja	.L1
 	movzbl	trigraph_map(%ecx), %eax
-.L1:
+.L2:
 	ret
 </pre>

+<p>This saves a register as well as a couple of move instructions.</p>
+
 <p>The difficulty is that common subexpression elimination is
 concerned with potential differences between these pseudo-RTL
 expressions:</p>
@@ -128,14 +111,15 @@
 However, we know that can't happen here, because <code>(reg:QI
 27)</code> is known to be positive at the time we attempt to do the
 <code>zero_extend</code>.  If it were negative, we would have jumped
-to <code>.L1</code>.</p>
+to <code>.L2</code>.</p>

 <hr />
 <h2><a name="storemerge">Store merging</a></h2>

-<p>(12 Nov 2004) GCC frequently generates multiple narrow writes to
-adjacent memory locations.  Memory writes are expensive; it would be
-better if they were combined.  For example:</p>
+<p>(12 Nov 2004, reconfirmed with trunk revision 156706) GCC
+frequently generates multiple narrow writes to adjacent memory
+locations.  Memory writes are expensive; it would be better if
+they were combined.  For example:</p>

 <pre>
 struct rtx_def
@@ -178,15 +162,15 @@
 i1:
 	movl	4(%esp), %eax
 	movl	$0x0, (%eax)
-	movb	$0x17, 2(%eax)
 	movw	$0x0c, (%eax)
+	movb	$0x17, 2(%eax)
 	ret

 i2:
 	movl	4(%esp), %eax
-	movb	$0x0, 3(%eax)
 	movw	$0x0c, (%eax)
 	movb	$0x17, 2(%eax)
+	movb	$0x00, 3(%eax)
 	ret
 </pre>

@@ -209,9 +193,9 @@
 <hr />
 <h2><a name="volatile">Volatile inhibits too many optimizations</a></h2>

-<p>(12 Nov 2004) gcc refuses to perform in-memory operations on
-volatile variables, on architectures that have those operations.
-Compare:</p>
+<p>(12 Nov 2004, reconfirmed with trunk revision 156706) GCC refuses
+to perform in-memory operations on volatile variables, on architectures
+that have those operations. Compare:</p>

 <pre>
 extern int a;
@@ -249,9 +233,9 @@
 <hr />
 <h2><a name="rndmode">Unnecessary changes of rounding mode</a></h2>

-<p>(12 Aug 2004) gcc does not remember the state of the floating point
-control register, so it changes it more than necessary.  Consider the
-following:</p>
+<p>(12 Nov 2004, reconfirmed with trunk revision 156706) GCC does not
+remember the state of the floating point control register, so it
+changes it more than necessary.  Consider the following:</p>

 <pre>
 void
@@ -356,11 +340,11 @@
 <hr />
 <h2><a name="fpmove">Moving floating point through integer registers</a></h2>

-<p>(22 Jan 2000) GCC knows how to move <code>float</code> quantities
-using integer instructions.  This is normally a win because floating
-point moves take more cycles.  However, it increases the pressure on
-the minuscule integer register file and therefore can end up making
-things worse.</p>
+<p>(22 Jan 2000, reconfirmed with trunk revision 156706) GCC knows how
+to move <code>float</code> quantities using integer instructions.  This
+is normally a win because floating point moves take more cycles.
+However, it increases the pressure on the minuscule integer register
+file and therefore can end up making things worse.</p>

 <pre>
 void
@@ -385,24 +369,24 @@
 side.  Only the inner loop is shown.</p>

 <pre>
-  2.95 @ -O2		3.1 @ -O2		    3.1 @ -O2 -fomit-fp
-  .L6:			.L6:			    .L6:
-  flds	(%edi,%eax,4)	movl  (%edi,%edx,4), %eax   movl  (%edi,%edx,4), %eax
-  fstps (%ebx,%eax,4)	movl  %eax, (%ebx,%edx,4)   movl  %eax, (%ebx,%edx,4)
-  flds	(%esi,%eax,4)	movl  (%esi,%edx,4), %eax   movl  (%esi,%edx,4), %eax
-  fstps (%ecx,%eax,4)	movl  %eax, (%ecx,%edx,4)   movl  %eax, (%ecx,%edx,4)
-  incl	%eax		incl  %edx		    incl  %edx
-  cmpl	%edx,%eax	cmpl  24(%ebp), %edx	    cmpl  %ebx, %edx
-  jl	.L6		jl    .L6		    jl	  .L6
+  2.95 @ -O2		3.1 @ -O2		    4.5 @ -O2
+  .L6:			.L6:			    .L3:
+  flds	(%edi,%eax,4)	movl  (%edi,%edx,4), %eax   movl  (%ebx,%eax,4), %edx
+  fstps (%ebx,%eax,4)	movl  %eax, (%ebx,%edx,4)   movl  %edx, (%edi,%eax,4)
+  flds	(%esi,%eax,4)	movl  (%esi,%edx,4), %eax   movl  (%esi,%eax,4), %edx
+  fstps (%ecx,%eax,4)	movl  %eax, (%ecx,%edx,4)   movl  %edx, 0(%ebp,%eax,4)
+  incl	%eax		incl  %edx		    incl  %eax
+  cmpl	%edx,%eax	cmpl  24(%ebp), %edx	    cmpl  %ecx, %eax
+  jl	.L6		jl    .L6		    jl	  .L3
 </pre>

 <p>The loop requires seven registers: four base pointers, an index, a
 limit, and a scratch.  All but the scratch must be integer.  The x86
-has only six integer registers under normal conditions.  gcc 2.95 uses
-a float register for the scratch, so the loop just fits.  2.96 tries
+has only six integer registers under normal conditions.  GCC 2.95 uses
+a float register for the scratch, so the loop just fits.  GCC 3.1 tries
 to use an integer register, and has to spill the limit register onto
-the stack to make everything fit.  Adding
-<code>-fomit-frame-pointer</code> makes a seventh integer register
+the stack to make everything fit.  GCC 4.5 (and GCC 3.1 if one adds
+<code>-fomit-frame-pointer</code>) makes a seventh integer register
 available, and the loop fits again.</p>

 <p>This is not that bad as these things go.  (GCC 3.0 was horrible; it
@@ -486,6 +470,7 @@
 <tr><td></td>        <td></td>         <td>yes</td>
<td>3.839</td> <td>97.14</td></tr>
 <tr><td></td>        <td>3.1</td>      <td>no</td>
<td>3.860</td> <td>97.67</td></tr>
 <tr><td></td>        <td></td>         <td>yes</td>
<td>3.845</td> <td>97.30</td></tr>
+<tr><td></td>        <td>4.5</td>      <td>no</td>
<td>3.845</td> <td>97.30</td></tr>
 <tr><td>fcpy2</td>   <td>3.1</td>      <td>yes</td>
<td>3.815</td> <td>96.54</td></tr>
 <tr><td>fcpy3</td>   <td></td>         <td></td>
<td>2.860</td> <td>72.37</td></tr>
 </table>