This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][GCC] Simplification of 1U << (31 - x)


Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments.

Thanks
Sudi



From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, August 4, 2017 11:16 AM
To: Sudi Das
Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification?

Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

          && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f2, .-f2
> @@ -39,10 +37,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>      sub    sp, sp, #16
>      str    w0, [sp, 12]
> -    mov    w1, 31
>      ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>      add    sp, sp, 16
>      ret
>      .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>        movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
>         movl    $1, %eax
>         salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco
    
diff --git a/gcc/match.pd b/gcc/match.pd
index e9017e4..160c12d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && tree_nop_conversion_p (type, TREE_TYPE (@1)))
    (lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+       && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+       && single_use (@1))
+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @2) @3)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert (rshift (lshift (convert:utype @0) @2) @3))))))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 0000000..317ea3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (31 - i);
+}
+
+__INT64_TYPE__ f2 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (63 - i);
+}
+
+__UINT64_TYPE__ f3 (__INT64_TYPE__ i)
+{
+  return (__UINT64_TYPE__)1 << (63 - i);
+}
+
+__INT32_TYPE__ f4 (__INT32_TYPE__ i)
+{
+  return (__INT32_TYPE__)1 << (31 - i);
+}
+
+/* { dg-final { scan-tree-dump-times "= 31 -"  1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]