12966 – x86 array comparison optimization

Bug 12966 - x86 array comparison optimization

Summary: x86 array comparison optimization

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	3.3.2

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	16996
	Show dependency tree / graph

Reported:	2003-11-08 18:15 UTC by Dave Richards
Modified:	2011-05-22 14:37 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-05-22 16:37:22

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dave Richards 2003-11-08 18:15:35 UTC

gnatmake -O3 q

See OPTIMIZE below.

Ada source:

with Ada.Text_IO;
use Ada.Text_IO;

procedure Q is

type Unsigned_32 is
       mod 2 ** 32;

type A_Type is
       array (1..4) of Unsigned_32;

X: A_Type := (0, 0, 0, 0);
Y: A_Type := (0, 0, 0, 0);

begin
  if X = Y then
    Put_Line(Standard_Output, "=");
  else
    Put_Line(Standard_Output, "/=");
  end if;
end Q;

x86 object:

q.o:     file format elf32-i386

Disassembly of section .text:

--
-- procedure Q is
-- begin
--

00000000 <_ada_q>:
   0:   55                      push   %ebp
   6:   89 e5                   mov    %esp,%ebp
   b:   83 ec 38                sub    $0x38,%esp
   e:   89 7d fc                mov    %edi,0xfffffffc(%ebp)
  16:   89 75 f8                mov    %esi,0xfffffff8(%ebp)

--
-- X := (0, 0, 0, 0);
--

   8:   fc                      cld    
   9:   31 c0                   xor    %eax,%eax
   1:   ba 04 00 00 00          mov    $0x4,%edx
  11:   89 d1                   mov    %edx,%ecx
  13:   8d 7d e8                lea    0xffffffe8(%ebp),%edi
  1c:   f3 ab                   repz stos %eax,%es:(%edi)

--
-- Y := (0, 0, 0, 0);
--

  21:   89 d1                   mov    %edx,%ecx
  1e:   8d 7d d8                lea    0xffffffd8(%ebp),%edi
  23:   f3 ab                   repz stos %eax,%es:(%edi)

--
-- OPTIMIZE
--
-- if X = Y then
--
-- In the previous two sequences we knew enough to use stosl loops
-- to initialize the X and Y arrays.  In this sequence we're comparing the
-- same two arrays for equality.  However, this sequence reverts to a
-- cmpsb (not a cmpsl) loop.  There is a 4:1 execution penalty for making
-- this decision with no impact on code size.
--

  28:   b9 10 00 00 00          mov    $0x10,%ecx
  19:   8d 75 d8                lea    0xffffffd8(%ebp),%esi
  25:   8d 7d e8                lea    0xffffffe8(%ebp),%edi
  2d:   f3 a6                   repz cmpsb %es:(%edi),%ds:(%esi)

--
-- Put_Line(Standard_Output, "=");
--

  2f:   75 2f                   jne    60 <_ada_q+0x60>
  31:   e8 fc ff ff ff          call   32 <_ada_q+0x32>
                        32: R_386_PC32  ada__text_io__standard_output
  36:   ba 10 00 00 00          mov    $0x10,%edx
                        37: R_386_32    .rodata
  3b:   b9 00 00 00 00          mov    $0x0,%ecx
                        3c: R_386_32    .rodata
  40:   89 54 24 04             mov    %edx,0x4(%esp,1)
  44:   89 4c 24 08             mov    %ecx,0x8(%esp,1)
  48:   89 04 24                mov    %eax,(%esp,1)
  4b:   e8 fc ff ff ff          call   4c <_ada_q+0x4c>
                        4c: R_386_PC32  ada__text_io__put_line

--
-- end if;
-- end Q;
--

  50:   8b 75 f8                mov    0xfffffff8(%ebp),%esi
  53:   8b 7d fc                mov    0xfffffffc(%ebp),%edi
  56:   89 ec                   mov    %ebp,%esp
  58:   5d                      pop    %ebp
  59:   c3                      ret    

--
-- Put_Line(Standard_Output, "/=");
--

  5a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  60:   e8 fc ff ff ff          call   61 <_ada_q+0x61>
                        61: R_386_PC32  ada__text_io__standard_output
  65:   ba 11 00 00 00          mov    $0x11,%edx
                        66: R_386_32    .rodata
  6a:   b9 08 00 00 00          mov    $0x8,%ecx
                        6b: R_386_32    .rodata
  6f:   eb cf                   jmp    40 <_ada_q+0x40>

Comment 1 Andrew Pinski 2003-11-08 18:24:13 UTC

The problem is that ada is makes a "call" to memcmp which has to be done with byte wise, so 
maybe the builtin expand needs to be help to "fix" (help out) for the call to memcmp.

Comment 2 Andrew Pinski 2004-10-29 14:05:25 UTC

Well Ada does not build a memcmp but the middle-end does and we don't optimize memcmp well in 
the middle-end/back-end that well.

Comment 3 Steven Bosscher 2009-06-22 23:22:18 UTC

Digging old bugs can be fun...
Andrew, do you think this is perhaps fixed by Jakub's x86 mem* work?

Comment 4 Steven Bosscher 2011-05-22 14:37:33 UTC

Still a cmpsb there.