Clobbers and Scratch Registers
Alan Modra
amodra@gmail.com
Mon Aug 21 07:44:00 GMT 2017
This is a revised version of
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to
showing just the scratch register aspect, as a followup to
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html
* doc/extend.texi (Extended Asm <Clobbers>): Rename to
"Clobbers and Scratch Registers". Add paragraph on
alternative to clobbers for scratch registers and OpenBLAS
example.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 940490e..0637672 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the
@item Clobbers
A comma-separated list of registers or other values changed by the
@var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted. @xref{Clobbers}.
+An empty list is permitted. @xref{Clobbers and Scratch Registers}.
@item GotoLabels
When you are using the @code{goto} form of @code{asm}, this section contains
@@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax.
When the compiler selects the registers to use to
represent the output operands, it does not use any of the clobbered registers
-(@pxref{Clobbers}).
+(@pxref{Clobbers and Scratch Registers}).
Output operand expressions must be lvalues. The compiler cannot check whether
the operands have data types that are reasonable for the instruction being
@@ -8671,7 +8671,8 @@ as input. The enclosing parentheses are a required part of the syntax.
@end table
When the compiler selects the registers to use to represent the input
-operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
If there are no output operands but there are input operands, place two
consecutive colons where the output operands would go:
@@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]"
: "r" (test), "r" (new), "[result]" (old));
@end example
-@anchor{Clobbers}
-@subsubsection Clobbers
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
@cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
While the compiler is aware of changes to entries listed in the output
operands, the inline @code{asm} code may modify more than just the outputs. For
@@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha)
@}
@end smallexample
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, an alternative is to define a
+variable and make it an early-clobber output as with @code{a2} and
+@code{a3} in the example below. This gives the compiler register
+allocator more freedom. You can also define a variable and make it an
+output tied to an input as with @code{a0} and @code{a1}, tied
+respectively to @code{ap} and @code{lda}. Of course, with tied
+outputs your @code{asm} can't use the input value after modifying the
+output register since they are one and the same register. Note also
+that tying an input to an output is the way to set up an initialized
+temporary register modified by an @code{asm} statement. An input not
+tied to an output is assumed by GCC to be unchanged, for example
+@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that
+register in following code if the value 16 happened to be needed. You
+can even use a normal @code{asm} output for a scratch if all inputs
+that might share the same register are consumed before the scratch is
+used. The VSX registers clobbered by the @code{asm} statement could
+have used this technique except for GCC's limit on the number of
+@code{asm} parameters.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+ const double *x, double *y, double alpha)
+@{
+ double *a0;
+ double *a1;
+ double *a2;
+ double *a3;
+
+ __asm__
+ (
+ /* lots of asm here */
+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+ "#a0=%3 a1=%4 a2=%5 a3=%6"
+ :
+ "+m" (*(double (*)[n]) y),
+ "+r" (n), // 1
+ "+b" (y), // 2
+ "=b" (a0), // 3
+ "=b" (a1), // 4
+ "=&b" (a2), // 5
+ "=&b" (a3) // 6
+ :
+ "m" (*(const double (*)[n]) x),
+ "m" (*(const double (*)[]) ap),
+ "d" (alpha), // 9
+ "r" (x), // 10
+ "b" (16), // 11
+ "3" (ap), // 12
+ "4" (lda) // 13
+ :
+ "cr0",
+ "vs32","vs33","vs34","vs35","vs36","vs37",
+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+ );
+@}
+@end smallexample
+
@anchor{GotoLabels}
@subsubsection Goto Labels
@cindex @code{asm} goto labels
--
Alan Modra
Australia Development Lab, IBM
More information about the Gcc-patches
mailing list