Using a __restrict parameter in an asm results in the wrong clique assigned to the MEM_REF if the same pointer is used without an asm (where it has the correct clique). This happens since GCC 6.4.0 and up (including GCC 7 and trunk). GCC 6.3.0 assigns the clique differently (and even worse, in some cases, even generates wrong code), for 6.4.0 a fix was backported but the fix still has wrong cliques. Consider this snippet: auto abc(int** __restrict a, int** __restrict b) { *a = 0; asm volatile("":"+m"(*b)); *a = 0; return *b; } Compile with -O2 -fipa-pta (the clique is _always_ wrong, but -fipa-pta is needed for demonstration for some reason, I don't know why). The output on x86_64: GCC 6.3.0: movq $0, (%rdi) movq (%rsi), %rax ret GCC 6.4.0 / GCC 7 / trunk: movq $0, (%rdi) movq $0, (%rdi) # redundant movq (%rsi), %rax ret As you can see, the RTL DSE pass doesn't remove the redundant store *a, because the clique is wrong and thinks the asm can alias with *a, which is not the case, because they're both __restrict. GCC 6.3.0 generates "better" code because it has the other bug 79552 (which is fixed), so that's not a solution. I'll comment with the cliques I got when inspecting after the "optimized" pass (final gimple pass): auto abc(int** __restrict a, int** __restrict b) { *a = 0; // clique 1 base 1 asm volatile("":"+m"(*b)); // clique 0 base 0 (wrong) *a = 0; // clique 1 base 1 return *b; // clique 1 base 2 (what it should be) } The asm should obviously have clique 1 base 2 to match the return, since it's the same pointer. To me, this is clearly a bug. I don't know if it's a missed optimization or can even produce wrong code in certain cases, however. Note that if you remove the return (i.e. any use of the *b pointer other than the asm), you get this: auto abc(int** __restrict a, int** __restrict b) { *a = 0; // clique 1 base 1 asm volatile("":"+m"(*b)); // clique 1 base 0 *a = 0; // clique 1 base 1 } Which is correct because it has the same clique as *a (1), but the base 0 worries me a bit, shouldn't it be base 2 as well just like before? Note that the cliques are _always_ wrong, but the redundant output appears only with -fipa-pta (even though not even one call is involved), not sure why (not sure how it can remove the redundant store in that case though).
Yes, this is a known limitation: if (restrict_var) { /* Now look at possible dereferences of ptr. */ imm_use_iterator ui; gimple *use_stmt; bool used = false; FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr) { /* ??? Calls and asms. */ if (!gimple_assign_single_p (use_stmt)) continue;
I'm not familiar with tree-ssa-structalias, but it appears to me that the "fix" is quite simple? Or am I missing something? Here's the snippet from it, updated with my attempt: if (restrict_var) { /* Now look at possible dereferences of ptr. */ imm_use_iterator ui; gimple *use_stmt; bool used = false; FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr) { if (!gimple_assign_single_p (use_stmt)) { /* ??? Calls. */ if (gimple_code (use_stmt) != GIMPLE_ASM) continue; gasm *asm_stmt = as_a <gasm *> (use_stmt); unsigned n = gimple_asm_ninputs (asm_stmt); for (unsigned i = 0; i < n; i++) { tree op = TREE_VALUE (gimple_asm_input_op (asm_stmt, i)); used |= maybe_set_dependence_info (op, ptr, clique, restrict_var, last_ruid); } n = gimple_asm_noutputs (asm_stmt); for (unsigned i = 0; i < n; i++) { tree op = TREE_VALUE (gimple_asm_output_op (asm_stmt, i)); used |= maybe_set_dependence_info (op, ptr, clique, restrict_var, last_ruid); } continue; } Does this not work? Sorry if I am missing something here.
On Wed, 31 Jan 2018, katsunori.kumatani at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84013 > > --- Comment #2 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> --- > I'm not familiar with tree-ssa-structalias, but it appears to me that the "fix" > is quite simple? Or am I missing something? Here's the snippet from it, updated > with my attempt: > > if (restrict_var) > { > /* Now look at possible dereferences of ptr. */ > imm_use_iterator ui; > gimple *use_stmt; > bool used = false; > FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr) > { > if (!gimple_assign_single_p (use_stmt)) > { > /* ??? Calls. */ > if (gimple_code (use_stmt) != GIMPLE_ASM) > continue; > > gasm *asm_stmt = as_a <gasm *> (use_stmt); > unsigned n = gimple_asm_ninputs (asm_stmt); > for (unsigned i = 0; i < n; i++) > { > tree op = TREE_VALUE (gimple_asm_input_op (asm_stmt, i)); > used |= maybe_set_dependence_info (op, ptr, clique, > restrict_var, > last_ruid); > } > n = gimple_asm_noutputs (asm_stmt); > for (unsigned i = 0; i < n; i++) > { > tree op = TREE_VALUE (gimple_asm_output_op (asm_stmt, > i)); > used |= maybe_set_dependence_info (op, ptr, clique, > restrict_var, > last_ruid); > } > continue; > } > > > Does this not work? Sorry if I am missing something here. Yes, this should work. It's not difficult to fix I just thought it wasn't important to track restrict across asm()s ... A similar fix is needed for call return and operands. Writing the code a little nicer and more compact should be possible as well - I just need to think about it somewhat (similar to the code a bit below we should be able to use walk_stmt_load_store_ops). I will deal with this for GCC 9 given this isn't a regression.
Thanks, it's quite useful in some "meta asm" cases (in conjunction with plugins, asms can be useful since you can't add builtins). Or when doing custom calls in asms (or syscalls, etc) and you know what memory they touch, without having to clobber everything. :)
Hi, any news of this for GCC 9? I'm guessing it requires a bit more changes, hopefully not forgotten though. Currently I'm using a custom patched GCC 8 for it (and to test plugin behavior with it) but it's not ideal.
Created attachment 44886 [details] patch I am testing Patch I am testing - sorry for the delay.
Fixed.
Author: rguenth Date: Wed Oct 24 09:42:19 2018 New Revision: 265455 URL: https://gcc.gnu.org/viewcvs?rev=265455&root=gcc&view=rev Log: 2018-10-24 Richard Biener <rguenther@suse.de> PR tree-optimization/84013 * tree-ssa-structalias.c (struct msdi_data): New struct for marshalling data to walk_stmt_load_store_ops. (maybe_set_dependence_info): Refactor as callback for walk_stmt_load_store_ops. (compute_dependence_clique): Set restrict info on all stmt kinds. * gcc.dg/tree-ssa/restrict-9.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/restrict-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-structalias.c