This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [wide-int] int_traits <tree>


On Oct 17, 2013, at 1:46 AM, Richard Biener <rguenther@suse.de> wrote:
> [This function shows another optimization issue:
> 
>    case BOOLEAN_TYPE:
>      /* Cache false or true.  */
>      limit = 2;
>      if (wi::leu_p (cst, 1))
>        ix = cst.to_uhwi ();
> 
> I would have expected cst <= 1 be optimized to cst.len == 1 &&
> cst.val[0] <= 1.

So lts_p has the same problem, and is used just below.  If we do:

@@ -1461,12 +1470,22 @@ wi::ne_p (const T1 &x, const T2 &y)
 inline bool
 wi::lts_p (const wide_int_ref &x, const wide_int_ref &y)
 {
-  if (x.precision <= HOST_BITS_PER_WIDE_INT
-      && y.precision <= HOST_BITS_PER_WIDE_INT)
-    return x.slow () < y.slow ();
-  else
-    return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
-                       y.precision);
+  // We optimize w < N, where N is 64 or fewer bits.
+  if (y.precision <= HOST_BITS_PER_WIDE_INT)
+    {
+      // If it fits directly into a shwi, we can compare directly.
+      if (wi::fits_shwi_p (x))
+       return x.slow () < y.slow ();
+      // If it is doesn't fit, then it must be more negative than any
+      // value in y, and hence smaller.
+      if (neg_p (x, SIGNED))
+       return true;
+      // If it is positve, then it must be larger than any value in y,
+      // and hence greater than.
+      return false;
+    }
+  return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
+                     y.precision);
 }

then for:

bool MGEN(wide_int cst) {
  return wi::lts_p (cst, 100);
}

we generate:

        movl    520(%rsp), %eax
        cmpl    $1, %eax
        je      .L5283
        subl    $1, %eax
        movq    8(%rsp,%rax,8), %rax
        shrq    $63, %rax
        ret
        .p2align 4,,10
        .p2align 3
.L5283:
        cmpq    $99, 8(%rsp)
        setle   %al
        ret

which as you can see, never calls lts_p_large, and hence, if that was the only reason the storage escapes, then it should be able to optimize better.  In the above code, minimal, no function calls, and the 100 is propagated all the way down into the cmpq.  Now, the reason why we should do it this way, most of the time, most types are 64 bits or less.  When that is true, then it will never call into lts_p_large.

If the above 100 is changed to l, from a parameter long l, then the code is the same except the last part is:

        cmpq    8(%rsp), %rdi
        setg    %al
        ret

If we have two wide_int's, then the code does this:

        .cfi_startproc
        subq    $8, %rsp
	.cfi_def_cfa_offset 16
        movl    1052(%rsp), %r9d
        movl    1048(%rsp), %r8d
        movl    532(%rsp), %edx
        movl    528(%rsp), %esi
        cmpl    $64, %r9d
        ja      .L5281
        cmpl    $1, %esi
	je      .L5284
        subl    $1, %esi
        movq    16(%rsp,%rsi,8), %rax
        addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        shrq    $63, %rax
        ret
        .p2align 4,,10
        .p2align 3
.L5281:
        .cfi_restore_state
        leaq    536(%rsp), %rcx
        leaq    16(%rsp), %rdi
        call    _ZN2wi11lts_p_largeEPKljjS1_jj
	addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret
        .p2align 4,,10
        .p2align 3
.L5284:
        .cfi_restore_state
        movq    536(%rsp), %rax
        cmpq    %rax, 16(%rsp)
        setl    %al
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

which is still pretty nice.

Does this look ok?  Kenny, can you double check the cases, think I have them right, but…  a double check would be good.

> the representation should guarantee the
> compare with a low precision (32 bit) constant is evaluatable
> at compile-time if len of the larger value is > 1, no?

I agree, though, I didn't check the symmetric case, as constants and smaller values can be put on the right by convention.

> The question is whether we should try to optimize wide-int for
> such cases or simply not use wi:leu_p (cst, 1) but rather
> 
> if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)
> 
> ?

Since we can do an excellent job with the simple interface, I'd argue for the simple interface and the simple change to do the conditional better.  I'd rather up the ante on meta programming to get any performance we want, retaining the simple interfaces.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]