Bug 84013 - wrong __restrict clique with inline asm operand
Summary: wrong __restrict clique with inline asm operand
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2018-01-23 23:39 UTC by Katsunori Kumatani
Modified: 2018-10-24 09:42 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work: 9.0
Known to fail: 6.4.0, 7.1.0, 7.2.0
Last reconfirmed: 2018-01-24 00:00:00


Attachments
patch I am testing (1.61 KB, patch)
2018-10-24 07:53 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Katsunori Kumatani 2018-01-23 23:39:23 UTC
Using a __restrict parameter in an asm results in the wrong clique assigned to the MEM_REF if the same pointer is used without an asm (where it has the correct clique). This happens since GCC 6.4.0 and up (including GCC 7 and trunk). GCC 6.3.0 assigns the clique differently (and even worse, in some cases, even generates wrong code), for 6.4.0 a fix was backported but the fix still has wrong cliques. Consider this snippet:

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;
  asm volatile("":"+m"(*b));
  *a = 0;
  return *b;
}

Compile with -O2 -fipa-pta (the clique is _always_ wrong, but -fipa-pta is needed for demonstration for some reason, I don't know why). The output on x86_64:

GCC 6.3.0:
  movq $0, (%rdi)
  movq (%rsi), %rax
  ret

GCC 6.4.0 / GCC 7 / trunk:
  movq $0, (%rdi)
  movq $0, (%rdi)             # redundant
  movq (%rsi), %rax
  ret

As you can see, the RTL DSE pass doesn't remove the redundant store *a, because the clique is wrong and thinks the asm can alias with *a, which is not the case, because they're both __restrict. GCC 6.3.0 generates "better" code because it has the other bug 79552 (which is fixed), so that's not a solution.

I'll comment with the cliques I got when inspecting after the "optimized" pass (final gimple pass):

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;                     // clique 1 base 1
  asm volatile("":"+m"(*b));  // clique 0 base 0 (wrong)
  *a = 0;                     // clique 1 base 1
  return *b;                  // clique 1 base 2 (what it should be)
}

The asm should obviously have clique 1 base 2 to match the return, since it's the same pointer. To me, this is clearly a bug. I don't know if it's a missed optimization or can even produce wrong code in certain cases, however.

Note that if you remove the return (i.e. any use of the *b pointer other than the asm), you get this:

auto abc(int** __restrict a, int** __restrict b)
{
  *a = 0;                     // clique 1 base 1
  asm volatile("":"+m"(*b));  // clique 1 base 0
  *a = 0;                     // clique 1 base 1
}

Which is correct because it has the same clique as *a (1), but the base 0 worries me a bit, shouldn't it be base 2 as well just like before?

Note that the cliques are _always_ wrong, but the redundant output appears only with -fipa-pta (even though not even one call is involved), not sure why (not sure how it can remove the redundant store in that case though).
Comment 1 Richard Biener 2018-01-24 09:18:12 UTC
Yes, this is a known limitation:

      if (restrict_var)
        {
          /* Now look at possible dereferences of ptr.  */
          imm_use_iterator ui;
          gimple *use_stmt;
          bool used = false;
          FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr)
            {
              /* ???  Calls and asms.  */
              if (!gimple_assign_single_p (use_stmt))
                continue;
Comment 2 Katsunori Kumatani 2018-01-31 19:31:15 UTC
I'm not familiar with tree-ssa-structalias, but it appears to me that the "fix" is quite simple? Or am I missing something? Here's the snippet from it, updated with my attempt:

      if (restrict_var)
	{
	  /* Now look at possible dereferences of ptr.  */
	  imm_use_iterator ui;
	  gimple *use_stmt;
	  bool used = false;
	  FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr)
	    {
	      if (!gimple_assign_single_p (use_stmt))
		{
		  /* ???  Calls.  */
		  if (gimple_code (use_stmt) != GIMPLE_ASM)
		    continue;

		  gasm *asm_stmt = as_a <gasm *> (use_stmt);
		  unsigned n = gimple_asm_ninputs (asm_stmt);
		  for (unsigned i = 0; i < n; i++)
		    {
		      tree op = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
		      used |= maybe_set_dependence_info (op, ptr, clique,
							 restrict_var,
							 last_ruid);
		    }
		  n = gimple_asm_noutputs (asm_stmt);
		  for (unsigned i = 0; i < n; i++)
		    {
		      tree op = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
		      used |= maybe_set_dependence_info (op, ptr, clique,
							 restrict_var,
							 last_ruid);
		    }
		  continue;
		}


Does this not work? Sorry if I am missing something here.
Comment 3 rguenther@suse.de 2018-02-01 08:36:41 UTC
On Wed, 31 Jan 2018, katsunori.kumatani at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84013
> 
> --- Comment #2 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> ---
> I'm not familiar with tree-ssa-structalias, but it appears to me that the "fix"
> is quite simple? Or am I missing something? Here's the snippet from it, updated
> with my attempt:
> 
>       if (restrict_var)
>         {
>           /* Now look at possible dereferences of ptr.  */
>           imm_use_iterator ui;
>           gimple *use_stmt;
>           bool used = false;
>           FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr)
>             {
>               if (!gimple_assign_single_p (use_stmt))
>                 {
>                   /* ???  Calls.  */
>                   if (gimple_code (use_stmt) != GIMPLE_ASM)
>                     continue;
> 
>                   gasm *asm_stmt = as_a <gasm *> (use_stmt);
>                   unsigned n = gimple_asm_ninputs (asm_stmt);
>                   for (unsigned i = 0; i < n; i++)
>                     {
>                       tree op = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
>                       used |= maybe_set_dependence_info (op, ptr, clique,
>                                                          restrict_var,
>                                                          last_ruid);
>                     }
>                   n = gimple_asm_noutputs (asm_stmt);
>                   for (unsigned i = 0; i < n; i++)
>                     {
>                       tree op = TREE_VALUE (gimple_asm_output_op (asm_stmt,
> i));
>                       used |= maybe_set_dependence_info (op, ptr, clique,
>                                                          restrict_var,
>                                                          last_ruid);
>                     }
>                   continue;
>                 }
> 
> 
> Does this not work? Sorry if I am missing something here.

Yes, this should work.  It's not difficult to fix I just thought it
wasn't important to track restrict across asm()s ...  A similar
fix is needed for call return and operands.  Writing the code
a little nicer and more compact should be possible as well - I
just need to think about it somewhat (similar to the code a bit
below we should be able to use walk_stmt_load_store_ops).

I will deal with this for GCC 9 given this isn't a regression.
Comment 4 Katsunori Kumatani 2018-02-01 13:15:28 UTC
Thanks, it's quite useful in some "meta asm" cases (in conjunction with plugins, asms can be useful since you can't add builtins). Or when doing custom calls in asms (or syscalls, etc) and you know what memory they touch, without having to clobber everything. :)
Comment 5 Katsunori Kumatani 2018-10-03 20:45:52 UTC
Hi, any news of this for GCC 9? I'm guessing it requires a bit more changes, hopefully not forgotten though. Currently I'm using a custom patched GCC 8 for it (and to test plugin behavior with it) but it's not ideal.
Comment 6 Richard Biener 2018-10-24 07:53:10 UTC
Created attachment 44886 [details]
patch I am testing

Patch I am testing - sorry for the delay.
Comment 7 Richard Biener 2018-10-24 09:42:35 UTC
Fixed.
Comment 8 Richard Biener 2018-10-24 09:42:55 UTC
Author: rguenth
Date: Wed Oct 24 09:42:19 2018
New Revision: 265455

URL: https://gcc.gnu.org/viewcvs?rev=265455&root=gcc&view=rev
Log:
2018-10-24  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/84013
	* tree-ssa-structalias.c (struct msdi_data): New struct for
	marshalling data to walk_stmt_load_store_ops.
	(maybe_set_dependence_info): Refactor as callback for
	walk_stmt_load_store_ops.
	(compute_dependence_clique): Set restrict info on all stmt kinds.

	* gcc.dg/tree-ssa/restrict-9.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/restrict-9.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-structalias.c