Bug 28691 - missed optimization, redundant scalar SSE comparisons
Summary: missed optimization, redundant scalar SSE comparisons
Status: RESOLVED DUPLICATE of bug 28685
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.2.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ssemmx
Depends on:
Blocks:
 
Reported: 2006-08-11 05:33 UTC by tbp
Modified: 2009-09-17 10:25 UTC (History)
5 users (show)

See Also:
Host: x86*
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-04-22 22:53:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tbp 2006-08-11 05:33:22 UTC
Symptoms: gcc fails to notice flags already available and generates redundant comparisons.

After tracking down a real world performance issue i couldn't rid of without ressorting to inline asm, i've bugged Uros; he produced that reduced testcase.

int test(float a, float b)
{
 int lt = a < b;
 int eq = a == b;

 return lt + eq;
}

with gcc-4.2-20060805/cygwin, -O3 -march=k8 -mfpmath=sse -fomit-frame-pointer -ffast-math:
00401050 <test(float, float)>:
  401050:       movss  0x4(%esp),%xmm1
  401056:       xor    %eax,%eax
  401058:       movss  0x8(%esp),%xmm0
  40105e:       comiss %xmm0,%xmm1
  401061:       sete   %al
  401064:       xor    %edx,%edx
  401066:       comiss %xmm0,%xmm1
  401069:       setb   %dl
  40106c:       add    %edx,%eax
  40106e:       ret

Also happens on x86-64.
This is not only related to PR 17390, but also integer compares :)
Comment 1 Andrew Pinski 2006-08-11 05:52:26 UTC
The problem is obvious from the RTL:
(insn:TI 47 7 48 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg/v:SF 22 xmm1 [orig:59 a ] [59])
            (reg/v:SF 21 xmm0 [orig:60 b ] [60]))) 23 {*cmpfp_i_sse} (insn_list:REG_DEP_OUTPUT 46 (insn_list:REG_DEP_TRUE 6 (insn_list:REG_DEP_TRUE 7 (nil))))
    (nil))

(insn 48 47 43 (set (strict_low_part (reg:QI 0 ax [62]))
        (unlt:QI (reg:CCFP 17 flags)
            (const_int 0 [0x0]))) 347 {*setcc_2} (insn_list:REG_DEP_TRUE 47 (insn_list:REG_DEP_TRUE 46 (nil)))
    (expr_list:REG_DEAD (reg:CCFP 17 flags)
        (nil)))

(insn 43 48 44 (parallel [
            (set (reg:SI 1 dx [64])
                (const_int 0 [0x0]))
            (clobber (reg:CC 17 flags))
        ]) 32 {*movsi_xor} (insn_list:REG_DEP_ANTI 48 (insn_list:REG_DEP_OUTPUT 47 (nil)))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

(insn:TI 44 43 45 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg/v:SF 22 xmm1 [orig:59 a ] [59])
            (reg/v:SF 21 xmm0 [orig:60 b ] [60]))) 23 {*cmpfp_i_sse} (insn_list:REG_DEP_ANTI 48 (insn_list:REG_DEP_OUTPUT 43 (insn_list:REG_DEP_OUTPUT 47 (insn_list:REG_DEP_TRUE 6 (insn_list:REG_DEP_TRUE 7 (nil))))))
    (expr_list:REG_DEAD (reg/v:SF 22 xmm1 [orig:59 a ] [59])
        (expr_list:REG_DEAD (reg/v:SF 21 xmm0 [orig:60 b ] [60])
            (nil))))

(insn 45 44 18 (set (strict_low_part (reg:QI 1 dx [64]))
        (uneq:QI (reg:CCFP 17 flags)
            (const_int 0 [0x0]))) 347 {*setcc_2} (insn_list:REG_DEP_TRUE 44 (insn_list:REG_DEP_TRUE 43 (nil)))
    (expr_list:REG_DEAD (reg:CCFP 17 flags)
        (nil)))

But the question is xor really setting the flags?
Maybe even just getting the expansion of setting the int.

Anyways here is a work around:
int test(float a, float b)
{
 unsigned char lt = a < b;
 unsigned char eq = a == b;

 unsigned char a1 = (lt + eq);
 return a1;
}
Using unsigned char and a temp variable removes the problem of zero extending the grabbing of the flags.
Comment 2 tbp 2006-08-11 06:07:21 UTC
Subject: Re:  missed optimization, redundant scalar SSE comparisons

On 11 Aug 2006 05:52:26 -0000, pinskia at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> Using unsigned char and a temp variable removes the problem of zero extending
> the grabbing of the flags.
For sure the code looks better when sprinkling uchars - modulo partial
stalls,  but gcc decidedly doesn't want to fuse those comparisons ;)

00418ed0 <kdlib::lbox_t::set_category(unsigned int, float, kdlib::sideness_t)>:
  418ed0:       sub    $0x10,%esp
  418ed3:       mov    %esi,0x4(%esp)
  418ed7:       mov    0x14(%esp),%esi
  418edb:       mov    %edi,0x8(%esp)
  418edf:       mov    0x18(%esp),%edi
  418ee3:       mov    %ebx,(%esp)
  418ee6:       mov    %ebp,0xc(%esp)
  418eea:       movzbl 0x34(%esi),%edx
  418eee:       movss  0x1c(%esp),%xmm1
  418ef4:       shl    $0x4,%edi
  418ef7:       movzbl 0x18(%esp),%ecx
  418efc:       movss  (%edi,%esi,1),%xmm0
  418f01:       comiss %xmm0,%xmm1
  418f04:       seta   %al
  418f07:       comiss %xmm1,%xmm0
  418f0a:       mov    %eax,%ebp
  418f0c:       mov    %edx,%eax
  418f0e:       sete   %bl
  418f11:       shr    $0x3,%al
  418f14:       and    $0x7,%eax
  418f17:       sar    %cl,%eax
  418f19:       mov    0x20(%esp),%ecx
  418f1d:       and    %eax,%ebx
  418f1f:       comiss 0x8(%edi,%esi,1),%xmm1
  418f24:       setb   %al
  418f27:       and    $0xfffffff8,%edx
  418f2a:       and    %ebp,%eax
  418f2c:       and    $0x1,%ebx
  418f2f:       cmove  %ebp,%ecx
  418f32:       or     %eax,%edx
  418f34:       and    $0x3,%ecx
  418f37:       add    %ecx,%ecx
  418f39:       or     %ecx,%edx
  418f3b:       mov    %dl,0x34(%esi)
  418f3e:       mov    (%esp),%ebx
  418f41:       mov    0x4(%esp),%esi
  418f45:       mov    0x8(%esp),%edi
  418f49:       mov    0xc(%esp),%ebp
  418f4d:       add    $0x10,%esp
  418f50:       ret
Comment 3 Andrew Pinski 2006-08-11 06:25:06 UTC
Hmm, now it is because the order int the compares are different:
  418f01:       comiss %xmm0,%xmm1
  418f04:       seta   %al
  418f07:       comiss %xmm1,%xmm0

see how the first is xmm0, xmm1 While the second is xmm1, xmm0.

Can you file a seperate bug with the source to that function (and make sure it compiles on its own) because that is a different bug as far as I can tell?
Comment 4 tbp 2006-08-11 06:43:37 UTC
Subject: Re:  missed optimization, redundant scalar SSE comparisons

On 11 Aug 2006 06:25:09 -0000, pinskia at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> ------- Comment #3 from pinskia at gcc dot gnu dot org  2006-08-11 06:25 -------
> Hmm, now it is because the order int the compares are different:
>   418f01:       comiss %xmm0,%xmm1
>   418f04:       seta   %al
>   418f07:       comiss %xmm1,%xmm0
>
> see how the first is xmm0, xmm1 While the second is xmm1, xmm0.
>
> Can you file a seperate bug with the source to that function (and make sure it
> compiles on its own) because that is a different bug as far as I can tell?
I suppose i could, but that would be an extremly fragile testcase; i
mean when i let that function (the version with uchars) inline, then
the first 2 comparisons get fused.

The fact is that you cannot reliably trigger "optimal" codegen, as far
as i can see.
Comment 5 Uroš Bizjak 2009-09-17 10:25:42 UTC
Actually a duplicate of PR 28685.

*** This bug has been marked as a duplicate of 28685 ***