This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Weak symbols vs. ptr-to-function: invalid optimisation or undefined behaviour?



    Afternoon all![*]


  A funny thing happened to me on the way to a shared libgcc (cygwin
target).  We've got a crtbegin that looks for the EH registration machinery
either in a DLL, or checks to see if it's statically linked by using a weak
reference that (IIUIC) is intended to not pull in the whole machinery unless
it's actually used/referenced in the rest of the application; if there isn't
a strong undef somewhere else, the weak symbol won't be pulled in, and the
symbol will resolve to zero at runtime.  So the relevant code looks a bit
like this (from gcc/config/i386/cygming-crtbegin.c):


extern void __register_frame_info (const void *, struct object *)
				   TARGET_ATTRIBUTE_WEAK;
extern void *__deregister_frame_info (const void *)
				      TARGET_ATTRIBUTE_WEAK;
void
__gcc_register_frame (void)
{
#if DWARF2_UNWIND_INFO
/* Weak undefined symbols won't be pulled in from dlls; hence
   we first test if the dll is already loaded and, if so,
   get the symbol's address at run-time.  If the dll is not loaded,
   fallback to weak linkage to static archive.  */

  void (*register_frame_fn) (const void *, struct object *);
  HANDLE h = GetModuleHandle (LIBGCC_SONAME);
  if (h)
    register_frame_fn = (void (*) (const void *, struct object *))
			GetProcAddress (h, "__register_frame_info");
  else 
    register_frame_fn = __register_frame_info;
  if (register_frame_fn)
     register_frame_fn (__EH_FRAME_BEGIN__, &obj);
#endif

... and it compiles like so:

00000000 <___gcc_register_frame>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	83 ec 08             	sub    $0x8,%esp
   6:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
			9: dir32	.rdata
   d:	e8 00 00 00 00       	call   12 <___gcc_register_frame+0x12>
			e: DISP32	_GetModuleHandleA@4
  12:	89 c2                	mov    %eax,%edx
  14:	83 ec 04             	sub    $0x4,%esp
  17:	85 d2                	test   %edx,%edx
  19:	b8 00 00 00 00       	mov    $0x0,%eax
			1a: dir32	___register_frame_info
  1e:	74 13                	je     33 <___gcc_register_frame+0x33>
  20:	c7 44 24 04 0d 00 00 	movl   $0xd,0x4(%esp)
  27:	00 
			24: dir32	.rdata
  28:	89 14 24             	mov    %edx,(%esp)
  2b:	e8 00 00 00 00       	call   30 <___gcc_register_frame+0x30>
			2c: DISP32	_GetProcAddress@8
  30:	83 ec 08             	sub    $0x8,%esp
  33:	85 c0                	test   %eax,%eax
  35:	74 11                	je     48 <___gcc_register_frame+0x48>
  37:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%esp)
  3e:	00 
			3b: dir32	.bss
  3f:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
			42: dir32	.eh_frame
  46:	ff d0                	call   *%eax
  48:	c9                   	leave  
  49:	c3                   	ret    
  4a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi

  Note in particular that the else { ... } arm of the if (h) clause is moved
up prior to the if test, where eax is loaded at offset 0x19; should the if
(h) test fail (the %edx test at 0x17), we branch at 0x1e around the then {
... } arm of the clause, and where the code paths rejoin at 0x33 we test the
value in %eax to implement the "if (register_frame_fn)" test before calling
through the function pointer.

  I changed the code with the intention that, if looking up the symbol in
the DLL fails (not because the DLL isn't loaded, but because it doesn't
export the function), it would fall back to trying the weak reference:

  void (*register_frame_fn) (const void *, struct object *) = 0;
  HANDLE h = GetModuleHandle (LIBGCC_SONAME);
  if (h)
    register_frame_fn = (void (*) (const void *, struct object *))
			GetProcAddress (h, "__register_frame_info");
  if (!register_frame_fn)
    register_frame_fn = __register_frame_info;
  if (register_frame_fn)
     register_frame_fn (__EH_FRAME_BEGIN__, &obj);

and now it compiles like /this/:


00000070 <___gcc_register_frame>:
  70:	55                   	push   %ebp
  71:	89 e5                	mov    %esp,%ebp
  73:	83 ec 08             	sub    $0x8,%esp
  76:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
			79: dir32	.rdata
  7d:	e8 00 00 00 00       	call   82 <___gcc_register_frame+0x12>
			7e: DISP32	_GetModuleHandleA@4
  82:	83 ec 04             	sub    $0x4,%esp
  85:	85 c0                	test   %eax,%eax
  87:	74 2a                	je     b3 <___gcc_register_frame+0x43>
  89:	c7 44 24 04 0d 00 00 	movl   $0xd,0x4(%esp)
  90:	00 
			8d: dir32	.rdata
  91:	89 04 24             	mov    %eax,(%esp)
  94:	e8 00 00 00 00       	call   99 <___gcc_register_frame+0x29>
			95: DISP32	_GetProcAddress@8
  99:	83 ec 08             	sub    $0x8,%esp
  9c:	85 c0                	test   %eax,%eax
  9e:	74 13                	je     b3 <___gcc_register_frame+0x43>
  a0:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%esp)
  a7:	00 
			a4: dir32	.bss
  a8:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
			ab: dir32	.eh_frame
  af:	ff d0                	call   *%eax
  b1:	c9                   	leave  
  b2:	c3                   	ret    
  b3:	b8 00 00 00 00       	mov    $0x0,%eax
			b4: dir32	___register_frame_info
  b8:	eb e6                	jmp    a0 <___gcc_register_frame+0x30>
  ba:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi


  Now, there's been some code motion.  Both the "if (h)" test (offset 0x85),
and the "if (!register_frame_fn)" test (offset 0x9c) branch to offset 0xb3,
where the assignment from the weak reference is performed.  *Unlike* the
previous case, however, this clause rejoins the main code path (at offset
0xa0) at a point /after/ the test of the final "if (register_frame_fn)"
clause, i.e. it branches straight into the then { ... } arm.

  This can go wrong if the weak symbol was not resolved at link-time,
because the resolved value is then zero and we jump into space at 0xaf.

  So:

1.  Is GCC assuming that the constant pointer-to-function in the assignment
in

  if (!register_frame_fn)
    register_frame_fn = __register_frame_info;

must never be zero, because it's a pointer to a function?

2.  Is it right for GCC to assume that?  Normally address-of-function
constants can't fail to be non-zero, but weak ones can.

  It would clearly be undefined code if we had written a function call that
was not resolved by the time it was executed at runtime, but here we're just
taking the address of the function as a constant, and it's supposed to be
zero if it's an unresolved weak reference.  So is the code bad, or is GCC
making an invalid assumption leading to an incorrect optimisation?  Stuff
like weak references is outside the scope of the C standard, so I may have
found an unfortunate corner-case here.


    cheers,
      DaveK

[*] - This offer void where prohibited by law, $TZ, or orbital dynamics.
-- 
Can't think of a witty .sigline today....


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]