Ada source: procedure T is type Var_Array is array (Integer range <>) of Integer; type Array_Ptr is access Var_Array; X : Array_Ptr; begin X := new Var_Array(1..10); if X(23) /= 1 then X(23) := 2; end if; end; x86 object: (see comments below in '--') t.o: file format elf32-i386 Disassembly of section .text: -- -- procedure T is -- begin -- 00000000 <_ada_t>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 18 sub $0x18,%esp -- -- X := new Var_Array(1..10); -- 6: c7 04 24 30 00 00 00 movl $0x30,(%esp,1) d: 89 5d fc mov %ebx,0xfffffffc(%ebp) 10: e8 fc ff ff ff call 11 <_ada_t+0x11> 11: R_386_PC32 __gnat_malloc 15: c7 00 01 00 00 00 movl $0x1,(%eax) 20: c7 40 04 0a 00 00 00 movl $0xa,0x4(%eax) -- -- BUG -- -- This sequence appears to be checking for X = null. In fact, what it is -- really doing is checking for X(1)'Access = null. -- -- The compiler is going out of its way to cache X(X'First)'Access. -- The problem with the whole X(1)'Access approach is that it allocates -- a register (edx) to hold a value of no import. Since the x86 has a dearth -- of registers to begin with, why not simply hold on to X.Access and -- displace X by the dope vector size (8 bytes). There doesn't appear to be -- any register pressure in this procedure, but why allocate an unnecessary -- register? -- 1b: 8d 50 08 lea 0x8(%eax),%edx 27: 85 d2 test %edx,%edx 29: 74 59 je 84 <_ada_t+0x84> -- -- OPTIMIZE -- -- if X(23) /= 1 then -- -- The bounds check could/should be optimized. Ignore the fact that we -- could statically determine X's range. The current range check does: -- -- if Index < X'First then fail; -- if Index > X'Last then fail; -- Temp := Index - X'First; -- ... -- -- which is the same as: -- -- Temp := Index - X'First; -- if Temp < 0 then fail; -- if Index > X'Last then fail; -- ... -- -- depending on life-time analysis it could become: -- -- if Index > X'Last then fail; -- Index := Index - X'First; -- if Index < 0 then fail; -- ... -- -- which saves 1 register and should be a condition code check against -- the subtract. -- 1e: 89 c1 mov %eax,%ecx 2b: 8b 00 mov (%eax),%eax 2d: 83 f8 17 cmp $0x17,%eax 30: 7f 06 jg 38 <_ada_t+0x38> 32: 83 79 04 16 cmpl $0x16,0x4(%ecx) 36: 7f 18 jg 50 <_ada_t+0x50> 38: c7 04 24 00 00 00 00 movl $0x0,(%esp,1) 3b: R_386_32 .rodata 3f: b8 0d 00 00 00 mov $0xd,%eax 44: 89 44 24 04 mov %eax,0x4(%esp,1) 48: e8 fc ff ff ff call 49 <_ada_t+0x49> 49: R_386_PC32 __gnat_rcheck_05 4d: 8d 76 00 lea 0x0(%esi),%esi 50: bb 17 00 00 00 mov $0x17,%ebx 55: 29 c3 sub %eax,%ebx 57: 83 3c 9a 01 cmpl $0x1,(%edx,%ebx,4) 5b: 74 0b je 68 <_ada_t+0x68> -- -- BUG -- -- X(23) := 2; -- -- Same bad if X(1).Access = null check again. -- -- What is odd about this sequence is that the compiler knew enough to -- elide the bounds check on the index, but the = null check (which was -- previously performed) still exists here. -- 5d: 85 d2 test %edx,%edx 5f: 74 0e je 6f <_ada_t+0x6f> 61: c7 04 9a 02 00 00 00 movl $0x2,(%edx,%ebx,4) 68: 8b 5d fc mov 0xfffffffc(%ebp),%ebx -- -- end if; -- end; -- 6b: 89 ec mov %ebp,%esp 6d: 5d pop %ebp 6e: c3 ret -- -- Range-check failure code. -- 6f: ba 0e 00 00 00 mov $0xe,%edx 74: 89 54 24 04 mov %edx,0x4(%esp,1) 78: c7 04 24 00 00 00 00 movl $0x0,(%esp,1) 7b: R_386_32 .rodata 7f: e8 fc ff ff ff call 80 <_ada_t+0x80> 80: R_386_PC32 __gnat_rcheck_00 84: b8 0d 00 00 00 mov $0xd,%eax 89: 89 44 24 04 mov %eax,0x4(%esp,1) 8d: eb e9 jmp 78 <_ada_t+0x78>
This part: -- This sequence appears to be checking for X = null. In fact, what it is -- really doing is checking for X(1)'Access = null. -- -- The compiler is going out of its way to cache X(X'First)'Access. -- The problem with the whole X(1)'Access approach is that it allocates -- a register (edx) to hold a value of no import. Since the x86 has a dearth -- of registers to begin with, why not simply hold on to X.Access and -- displace X by the dope vector size (8 bytes). There doesn't appear to be -- any register pressure in this procedure, but why allocate an unnecessary -- register? Looks like an Ada frontend bug and produces worse code for the PPC because storing on the stack.
This is now implemented. Arno
Ooops, wrong PR, I meant to close 12950, bugzilla automatically pointed me to the next ada PR :-( Arno
I agrue with you an can confirm this on powerpc-apple-darwin also.
Fixed on the mainline by the tree optimizators.