gnatmake -O3 q See OPTIMIZE below. Ada source: with Ada.Text_IO; use Ada.Text_IO; procedure Q is type Unsigned_32 is mod 2 ** 32; type A_Type is array (1..4) of Unsigned_32; X: A_Type := (0, 0, 0, 0); Y: A_Type := (0, 0, 0, 0); begin if X = Y then Put_Line(Standard_Output, "="); else Put_Line(Standard_Output, "/="); end if; end Q; x86 object: q.o: file format elf32-i386 Disassembly of section .text: -- -- procedure Q is -- begin -- 00000000 <_ada_q>: 0: 55 push %ebp 6: 89 e5 mov %esp,%ebp b: 83 ec 38 sub $0x38,%esp e: 89 7d fc mov %edi,0xfffffffc(%ebp) 16: 89 75 f8 mov %esi,0xfffffff8(%ebp) -- -- X := (0, 0, 0, 0); -- 8: fc cld 9: 31 c0 xor %eax,%eax 1: ba 04 00 00 00 mov $0x4,%edx 11: 89 d1 mov %edx,%ecx 13: 8d 7d e8 lea 0xffffffe8(%ebp),%edi 1c: f3 ab repz stos %eax,%es:(%edi) -- -- Y := (0, 0, 0, 0); -- 21: 89 d1 mov %edx,%ecx 1e: 8d 7d d8 lea 0xffffffd8(%ebp),%edi 23: f3 ab repz stos %eax,%es:(%edi) -- -- OPTIMIZE -- -- if X = Y then -- -- In the previous two sequences we knew enough to use stosl loops -- to initialize the X and Y arrays. In this sequence we're comparing the -- same two arrays for equality. However, this sequence reverts to a -- cmpsb (not a cmpsl) loop. There is a 4:1 execution penalty for making -- this decision with no impact on code size. -- 28: b9 10 00 00 00 mov $0x10,%ecx 19: 8d 75 d8 lea 0xffffffd8(%ebp),%esi 25: 8d 7d e8 lea 0xffffffe8(%ebp),%edi 2d: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi) -- -- Put_Line(Standard_Output, "="); -- 2f: 75 2f jne 60 <_ada_q+0x60> 31: e8 fc ff ff ff call 32 <_ada_q+0x32> 32: R_386_PC32 ada__text_io__standard_output 36: ba 10 00 00 00 mov $0x10,%edx 37: R_386_32 .rodata 3b: b9 00 00 00 00 mov $0x0,%ecx 3c: R_386_32 .rodata 40: 89 54 24 04 mov %edx,0x4(%esp,1) 44: 89 4c 24 08 mov %ecx,0x8(%esp,1) 48: 89 04 24 mov %eax,(%esp,1) 4b: e8 fc ff ff ff call 4c <_ada_q+0x4c> 4c: R_386_PC32 ada__text_io__put_line -- -- end if; -- end Q; -- 50: 8b 75 f8 mov 0xfffffff8(%ebp),%esi 53: 8b 7d fc mov 0xfffffffc(%ebp),%edi 56: 89 ec mov %ebp,%esp 58: 5d pop %ebp 59: c3 ret -- -- Put_Line(Standard_Output, "/="); -- 5a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 60: e8 fc ff ff ff call 61 <_ada_q+0x61> 61: R_386_PC32 ada__text_io__standard_output 65: ba 11 00 00 00 mov $0x11,%edx 66: R_386_32 .rodata 6a: b9 08 00 00 00 mov $0x8,%ecx 6b: R_386_32 .rodata 6f: eb cf jmp 40 <_ada_q+0x40>
The problem is that ada is makes a "call" to memcmp which has to be done with byte wise, so maybe the builtin expand needs to be help to "fix" (help out) for the call to memcmp.
Well Ada does not build a memcmp but the middle-end does and we don't optimize memcmp well in the middle-end/back-end that well.
Digging old bugs can be fun... Andrew, do you think this is perhaps fixed by Jakub's x86 mem* work?
Still a cmpsb there.