This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false


Hi all,

In this wrong-code PR we get bad code when synthesising a TImode right shift
by variable amount using DImode shifts during expand.

The expand_double_word_shift function expands two paths: one where the
variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for aarch64) at runtime
and one where it's less than that, and performs a conditional select between the two in the end.

The second path is the one that goes wrong here.
The algorithm it uses for calculating a TImode shift when the variable shift amount is <64 is:


  ------DImode-----   ------DImode----- ------DImode-----   ------DImode-----
{ |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ source-lo ___| } >> var_amnt

1) carry = source-hi << 1
2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending on shift_truncation_mask
3) carry = carry << tmp // net effect is that carry is source-hi shifted left by BITS_PER_WORD - var_amnt
4) target-lo = source-lo >>u var_amnt //unsigned shift.
5) target-lo = target-lo | carry
6) target-hi = source-hi >> var_amnt

where the two DImode words source-hi and source-lo are the two words of the source TImode value
and var_amnt is the register holding the variable shift amount. This is performed by the
expand_subword_shift function in optabs.c.

Step 2) is valid only if the target truncates the shift amount by the width of the type its shifting,
that is SHIFT_COUNT_TRUNCATED is true and TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.

Step 3) is the one that goes wrong.  On aarch64 a left shift is usually implemented using the LSL
instruction but we also have alternatives that can use the SIMD registers and the USHL instruction.
In this case we end up using the USHL instruction. However, although the USHL instruction does
a DImode shift, it doesn't truncate the shift amount by 64, but rather by 255. Furthermore, the
shift amount is interpreted as a 2's complement signed integer and if it's negative performs
a right shift.  This is in contrast with the normal LSL instruction which just performs an
unsigned shift by an amount truncated by 64.

Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't assume shift
amounts are truncated unless we know we can only ever use the LSL instructions for variable
shifts.

However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode, so expand_subword_shift
assumes that since the mask is non-zero it's a valid shift truncation mask.
The documentation for TARGET_SHIFT_TRUNCATION_MASK says:
" The default implementation of this function returns
  @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
  and 0 otherwise."

So since for TARGET_SIMD we cannot guarantee that all shifts truncate their amount, we should be
returning 0 in TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false.
This is what the patch does, and with it step 2) becomes:
2) tmp = BITS_PER_WORD - 1 - var_amnt
which behaves correctly on aarch64.

Unfortunately, the testcase from the PR has very recently gone latent on trunk because it
depends on register allocation picking a particular alternative from the *aarch64_ashl_sisd_or_int_<mode>3
pattern in aarch64.md. I tried to do some inline asm tricks to get it to force the correct constraints
but was unsuccessful. Nevertheless I've included the testcase in the patch. I suppose it's better than nothing.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill


2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR target/69613
    * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
    Return 0 if !SHIFT_COUNT_TRUNCATED

2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR target/69613
    * gcc.dg/torture/pr69613.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
 aarch64_shift_truncation_mask (machine_mode mode)
 {
   return
-    (aarch64_vector_mode_supported_p (mode)
+    (!SHIFT_COUNT_TRUNCATED
+     || aarch64_vector_mode_supported_p (mode)
      || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
 }
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c
new file mode 100644
index 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
@@ -0,0 +1,40 @@
+/* PR target/69613.  */
+/* { dg-do run { target int128 } } */
+/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
+
+typedef unsigned short u16;
+typedef unsigned short v32u16 __attribute__ ((vector_size (32)));
+typedef unsigned int u32;
+typedef unsigned int v32u32 __attribute__ ((vector_size (32)));
+typedef unsigned long long u64;
+typedef unsigned long long v32u64 __attribute__ ((vector_size (32)));
+typedef unsigned __int128 u128;
+typedef unsigned __int128 v32u128 __attribute__ ((vector_size (32)));
+
+u128 __attribute__ ((noinline, noclone))
+foo (u32 u32_0, u64 u64_1, u128 u128_1, v32u16 v32u16_0, v32u128 v32u128_0,
+     v32u16 v32u16_1, v32u32 v32u32_1, v32u64 v32u64_1, v32u128 v32u128_1)
+{
+  u128 temp = (v32u128_1[0] << ((-u32_0) & 127));
+  u32 t2 = (u32_0 & 127);
+  v32u128_1[0] = (v32u128_1[0] >> t2);
+
+  v32u128_1[0] ^= temp;
+  v32u128_1 |= (v32u128){ v32u128_0[1] };
+
+  return u64_1 + u128_1 + v32u16_0[0] + v32u16_0[1] + v32u16_1[11]
+	 + v32u16_1[12] + v32u16_1[13] + v32u32_1[0] + v32u32_1[1]
+	 + v32u32_1[2] + v32u64_1[1] + v32u64_1[2] + v32u64_1[3] + v32u128_1[0]
+	 + v32u128_1[1];
+}
+
+int
+main ()
+{
+  u128 x
+      = foo (1, 1, 1, (v32u16){ 1, 1, 1 }, (v32u128){ 1 }, (v32u16){ 1, 1, 1 },
+	     (v32u32){ 1 }, (v32u64){ 1, 1, 1 }, (v32u128){ -1 });
+  if (x != 6)
+    __builtin_abort ();
+  return 0;
+}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]