This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr


Hi Richard,

thanks for your input,

On 01/18/2016 12:36 PM, Richard Biener wrote:
On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
When compiling code with attribute targets on arm or aarch64,
vector_type_mode returns different results (eg Vmode or BLKmode) depending
on the current simd flags that are not set between functions.

for example the following code:

#include <arm_neon.h>

extern int8x8_t a;
extern int8x8_t b;

int16x8_t
__attribute__ ((target("fpu=neon")))
foo(void)
{
    return vaddl_s8 (a, b);
}

Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
because the mismatch and DECL_MODE current's TYPE_MODE used in
expand_builtin for global variables.

but the best explanation is in the vector_type_mode:
/* Vector types need to re-check the target flags each time we report
     the machine mode.  We need to do this because attribute target can
     change the result of vector_mode_supported_p and have_regs_of_mode
     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
     change on a per-function basis.  */

I first tried to hack the 2 machine descriptions to insert convert_to_mode
or relayout_decls here and there, but I found this very fragile. Instead a
more central relayout the of type while expanding gave good results, as
proposed here.

bootstraped and tested with no regression for arm, aarch64 and i586.

Does this look to be the right approach ?

nb: for testing this patch is complementary with

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html

thanks for your comments.
A x86 specific testcase that ICEs as well:

typedef int v8si __attribute__((vector_size(32)));
v8si a;
v8si __attribute__((target("avx"))) foo()
{
   return a;
}

in your patch not using the shared DECL_RTL of the global var
"fixes" this so I think a conceptually better fix would be to
"adjust" DECL_RTL from globals via a adjust_address (or so).

Also given that we do

       /* ... fall through ...  */

     case FUNCTION_DECL:
     case RESULT_DECL:
       decl_rtl = DECL_RTL (exp);
     expand_decl_rtl:
       gcc_assert (decl_rtl);
       decl_rtl = copy_rtx (decl_rtl);

thus always "unshare" DECL_RTL anyway it might be not so
bad to simply do

      decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);

instead of that to avoid one copy.

Index: expr.c
===================================================================
--- expr.c      (revision 232496)
+++ expr.c      (working copy)
@@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
        decl_rtl = DECL_RTL (exp);
      expand_decl_rtl:
        gcc_assert (decl_rtl);
-      decl_rtl = copy_rtx (decl_rtl);
+      if (MEM_P (decl_rtl))
+       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
+      else
+       decl_rtl = copy_rtx (decl_rtl);
        /* Record writes to register variables.  */
        if (modifier == EXPAND_WRITE
           && REG_P (decl_rtl)

untested apart from on the x86_64 testcase (which it fixes).  One could guard
this further to only apply on vector typed decls with mismatched mode of course.

I think that re-layouting globals is not very good design.

Richard.

A few other ICEs with this implementation, for instance if the context is not in a function, such as

typedef __simd64_int8_t int8x8_t;

extern int8x8_t b;
int8x8_t *a = &b;

So, to avoid a var re-layout and a copy_rtx (implied by adjust_address btw). What about just calling 'change_address' ? like: (very lightly tested)

Index: expr.c
===================================================================
--- expr.c    (revision 232564)
+++ expr.c    (working copy)
@@ -9392,7 +9392,8 @@
             enum expand_modifier modifier, rtx *alt_rtl,
             bool inner_reference_p)
 {
-  rtx op0, op1, temp, decl_rtl;
+  rtx op0, op1, temp;
+  rtx decl_rtl = NULL_RTX;
   tree type;
   int unsignedp;
   machine_mode mode, dmode;
@@ -9590,11 +9591,22 @@
       && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
     layout_decl (exp, 0);

+      decl_rtl = DECL_RTL (exp);
+
+      if (MEM_P (decl_rtl)
+      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
+    {
+      if (current_function_decl
+          && (! reload_completed && !reload_in_progress))
+        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+    }
+
       /* ... fall through ...  */

     case FUNCTION_DECL:
     case RESULT_DECL:
-      decl_rtl = DECL_RTL (exp);
+      if (! decl_rtl)
+    decl_rtl = DECL_RTL (exp);
     expand_decl_rtl:
       gcc_assert (decl_rtl);
       decl_rtl = copy_rtx (decl_rtl);

I'm not sure that moving the code in the 'expand_decl_rtl' label is best, as we'd need to test for exp and the case should only happen for global vars (not functions or results)

thanks,


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]