[PATCH] Asm memory constraints

Mon Aug 21 07:36:00 GMT 2017

On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote:
> Hi Alan,
> 
> On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote:
> > +Flushing registers to memory has performance implications and may be
> > +an issue for time-sensitive code.  You can provide better information
> > +to GCC to avoid this, as shown in the following examples.  At a
> > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
> > +need to be flushed.  Also, if GCC can prove that all of the outputs of
> > +a non-volatile @code{asm} statement are unused, then the @code{asm}
> > +may be deleted.  Removal of otherwise dead @code{asm} statements will
> > +not happen if they clobber @code{"memory"}.
> 
> void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); }
> void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); }
> 
> Both f and g are completely removed by the first jump pass immediately
> after expand (via delete_trivially_dead_insns).
> 
> Do you have a testcase for the behaviour you saw?

Oh my.  I was sure that was how "memory" worked!  I see though that
every gcc I have lying around, going all the way back to gcc-2.95,
deletes the asm in your testcase.  I definitely don't want to put
something in the docs that is plain wrong, or just my idea of how
things ought to work, so the last two sentences quoted above need to
go.  Thanks for the correction.

Fixed in this revised patch.  The only controversial aspect now should
be whether those array casts ought to be officially blessed.  I've
checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and
"=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled
apparently properly by alias.c and other code.

For example, at -O3 the following shows gcc moving the read of "val"
before the asm, while an asm using a "memory" clobber forces the read
to occur after the asm.

static int
f (double *x)
{
  int res;
  asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x));
  return res;
}

int val = 123;
double foo[10];

int
main ()
{
  int b = f (foo);
  __builtin_printf ("%d %d\n", val, b);
  return 0;
}


I'm also encouraged by comments like the following by rth in 2004
(gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory
output constraints must continue to be supported.

      /* ??? Really, this should not be here.  Users should be using a
	 proper lvalue, dammit.  But there's a long history of using casts
	 in the output operands.  In cases like longlong.h, this becomes a
	 primitive form of typechecking -- if the cast can be removed, then
	 the output operand had a type of the proper width; otherwise we'll
	 get an error.  Gross, but ...  */
      STRIP_NOPS (output);


	* doc/extend.texi (Clobbers): Correct vax example.  Delete old
	example of a memory input for a string of known length.  Move
	commentary out of table.  Add a number of new examples
	covering array memory inputs.
testsuite/
	* gcc.target/i386/asm-mem.c: New test.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 649be01..940490e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8755,7 +8755,7 @@ registers:
 asm volatile ("movc3 %0, %1, %2"
                    : /* No outputs. */
                    : "g" (from), "g" (to), "g" (count)
-                   : "r0", "r1", "r2", "r3", "r4", "r5");
+                   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
 @end example
 
 Also, there are two special clobber arguments:
@@ -8786,14 +8786,72 @@ Note that this clobber does not prevent the @emph{processor} from doing
 speculative reads past the @code{asm} statement. To prevent that, you need 
 processor-specific fence instructions.
 
-Flushing registers to memory has performance implications and may be an issue 
-for time-sensitive code.  You can use a trick to avoid this if the size of 
-the memory being accessed is known at compile time. For example, if accessing 
-ten bytes of a string, use a memory input like: 
+@end table
 
-@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}.
+Flushing registers to memory has performance implications and may be
+an issue for time-sensitive code.  You can provide better information
+to GCC to avoid this, as shown in the following examples.  At a
+minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
+need to be flushed.
 
-@end table
+Here is a fictitious sum of squares instruction, that takes two
+pointers to floating point values in memory and produces a floating
+point register output.
+Notice that @code{x}, and @code{y} both appear twice in the @code{asm}
+parameters, once to specify memory accessed, and once to specify a
+base register used by the @code{asm}.  You won't normally be wasting a
+register by doing this as GCC can use the same register for both
+purposes.  However, it would be foolish to use both @code{%1} and
+@code{%3} for @code{x} in this @code{asm} and expect them to be the
+same.  In fact, @code{%3} may well not be a register.  It might be a
+symbolic memory reference to the object pointed to by @code{x}.
+
+@smallexample
+asm ("sumsq %0, %1, %2"
+     : "+f" (result)
+     : "r" (x), "r" (y), "m" (*x), "m" (*y));
+@end smallexample
+
+Here is a fictitious @code{*z++ = *x++ * *y++} instruction.
+Notice that the @code{x}, @code{y} and @code{z} pointer registers
+must be specified as input/output because the @code{asm} modifies
+them.
+
+@smallexample
+asm ("vecmul %0, %1, %2"
+     : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
+     : "m" (*x), "m" (*y));
+@end smallexample
+
+An x86 example where the string memory argument is of unknown length.
+
+@smallexample
+asm("repne scasb"
+    : "=c" (count), "+D" (p)
+    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
+@end smallexample
+
+If you know the above will only be reading a ten byte array then you
+could instead use a memory input like:
+@code{"m" (*(const char (*)[10]) p)}.
+
+Here is an example of a PowerPC vector scale implemented in assembly,
+complete with vector and condition code clobbers, and some initialized
+offset registers that are unchanged by the @code{asm}.
+
+@smallexample
+void
+dscal (size_t n, double *x, double alpha)
+@{
+  asm ("/* lots of asm here */"
+       : "+m" (*(double (*)[n]) x), "+r" (n), "+b" (x)
+       : "d" (alpha), "b" (32), "b" (48), "b" (64),
+         "b" (80), "b" (96), "b" (112)
+       : "cr0",
+         "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
+         "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
+@}
+@end smallexample
 
 @anchor{GotoLabels}
 @subsubsection Goto Labels
diff --git a/gcc/testsuite/gcc.target/i386/asm-mem.c b/gcc/testsuite/gcc.target/i386/asm-mem.c
new file mode 100644
index 0000000..89b713f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/asm-mem.c
@@ -0,0 +1,59 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+/* Check that "m" array references are effective in preventing the
+   array initialization from wandering past a use in the asm, and
+   that the casts remain supported.  */
+
+static int
+f1 (const char *p)
+{
+  int count;
+
+  __asm__ ("repne scasb"
+	   : "=c" (count), "+D" (p)
+	   : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
+  return -2 - count;
+}
+
+static int
+f2 (const char *p)
+{
+  int count;
+
+  __asm__ ("repne scasb"
+	   : "=c" (count), "+D" (p)
+	   : "m" (*(const char (*)[48]) p), "0" (-1), "a" (0));
+  return -2 - count;
+}
+
+static int
+f3 (int n, const char *p)
+{
+  int count;
+
+  __asm__ ("repne scasb"
+	   : "=c" (count), "+D" (p)
+	   : "m" (*(const char (*)[n]) p), "0" (-1), "a" (0));
+  return -2 - count;
+}
+
+int
+main ()
+{
+  int a;
+  char buff[48] = "hello world";
+  buff[4] = 0;
+  a = f1 (buff);
+  if (a != 4)
+    __builtin_abort ();
+  buff[4] = 'o';
+  a = f2 (buff);
+  if (a != 11)
+    __builtin_abort ();
+  buff[4] = 0;
+  a = f3 (48, buff);
+  if (a != 4)
+    __builtin_abort ();
+  return 0;
+}

-- 
Alan Modra
Australia Development Lab, IBM