[Bug target/78794] [7 Regression] We noticed ~9% regression in 32-bit mode for 462.libquntum on Avoton after r243202

Tue Dec 13 14:55:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78794

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)

> Yes, this is a good idea.
Also, since pandn on non-BMI target replaces four arith insns with one, the
gain should be raised for 2 * ix86_cost->add for a total of 3 * ix86_cost->add.

The final patch is thus:

--cut here--

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1cd1cd8..6a746b2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3417,7 +3417,11 @@ dimode_scalar_chain::compute_convert_gain ()
               || GET_CODE (src) == AND)
        {
          gain += ix86_cost->add;
-         if (CONST_INT_P (XEXP (src, 0)))
+         /* Additional gain for andnot for targets without BMI.  */
+         if (GET_CODE (XEXP (src, 0)) == NOT
+             && !TARGET_BMI)
+           gain += 2 * ix86_cost->add;
+         else if (CONST_INT_P (XEXP (src, 0)))
            gain -= vector_const_cost (XEXP (src, 0));
          if (CONST_INT_P (XEXP (src, 1)))
            gain -= vector_const_cost (XEXP (src, 1));
--cut here-

Please also note that on BMI targets, the attached testcase won't be converted,
which is a good thing - the loop on BMI targets looks like:

.L4:
        movl    4(%eax), %edi
        andn    4(%esp), %edi, %ebx
        movl    (%eax), %esi
        movl    %ebx, %ebp
        andn    (%esp), %esi, %ecx
        orl     %ecx, %ebp
        jne     .L3
        xorl    8(%esp), %esi
        xorl    12(%esp), %edi
        movl    %esi, (%eax)
        movl    %edi, 4(%eax)
.L3:
        addl    $12, %eax
        cmpl    %edx, %eax
        jne     .L4