This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH: hex_value shouldn't be signed


Because the expression libiberty.h's hex_value macro expands to is a
signed int, it can get sign-extended when used to parse addresses.
This first appeared to be a bug in the ihex reader, but I'm leaning
towards fixing it by changing hex_value's type.

Consider the following ihex file:

:0200000480007A
:101B6C00000000010000000200000003000000045F
:101B7C00000000050000000600000007000000083F
:101B8C00000000090000000A0000000B0000000C1F
:101B9C000000000D0000000E0000000F00000010FF
:101BAC0000000011000000120000001300000014DF
:101BBC0000000015000000160000001700000018BF
:101BCC00000000190000001A0000001B0000001C9F
:101BDC000000001D0000001E0000001F000000207F
:00000001FF

An ihex data record (type 00) has a fixed-width 16-bit address field.
To place data at addresses above 64k, a file can contain Extended
Linear Address records (type 04), which supply the top 16 address bits
for subsequent data records.

In the above example, the first line is an Extended Linear Address
record, setting the top 16 address bits to 0x8000.  The second line is
a data record with an offset of 0x1b6c; its data should be stored at
0x80001b6c.

On any system where bfd_vma is 64 bits long, BFD will actually store
that record at 0xffffffff80001b6c.  Since the value of the hex_value
macro (defined in libiberty.h) is signed, the type of a HEX2 or HEX4
macro (defined in ihex.c) is signed; casting that to an unsigned
64-bit integer type causes the value to be sign-extended.  When
processing the Extended Linear Address record, we shift 0x8000 left
sixteen bits, and then cast it to bfd_vma, yielding 0xffffffff80000000
instead of 0x0000000080000000.

This situation actually arises on Linux for the IBM s390x 64-bit
processor.  The linker likes to locate data around 0x80000000, so
dumping values in ihex format with GDB's 'dump' command produces ihex
values that the binutils won't process properly.

It's worth pointing out that the Intel Hex format spec (well, it's
described in appendices of other manuals) doesn't actually say
anything about how 32-bit addresess should be interpreted on systems
where addresses are 64 bits long.  And there are processors (e.g.,
MIPS) that customarily sign-extended 32-bit addresses when used in a
64-bit context.  So while this patch fixes a bug on the s390x, it's
not clear it's universally right, and there's no authority to settle
the question.  And since ihex records don't give any indication of the
architecture they're intended for (they're just a hex dump), BFD can't
discriminate based on the architecture.

But I'd argue that the format itself treats addresses as unsigned.
Data records' 16-bit addresses are not sign-extended when an ihex file
is loaded into a 32-bit system in the absence of a Extended Linear
Address records.  If one uses Extended Address Records to generate
20-bit addresses, those aren't sign-extended to make 32-bit addresses.
So it seems most consistent not to sign-extend 32-bit addresses on a
64-bit system.

Here is a patch.  Based on eyeballing the way hex_value gets used in
bfd (other hex dump format readers), gas (parsing constants), and gcc
(same), I don't think this will break anything.

libiberty/ChangeLog:
2003-05-14  Jim Blandy  <jimb@redhat.com>

	* hex.c (_hex_value): Make this unsigned.
	(hex_value): Update documentation for new return type.  hex_value
	now expands to an unsigned int expression, to avoid unexpected
	sign extension when we store it in a bfd_vma, which is larger than
	int on some platforms.
	* functions.texi: Regenerated.

include/ChangeLog:
2003-05-14  Jim Blandy  <jimb@redhat.com>

	* libiberty.h (hex_value): Make the value an unsigned int, to
	avoid unexpected sign-extension when cast to unsigned types larger
	than int --- like bfd_vma, on some platforms.
	(_hex_value): Update declaration.

Index: include/libiberty.h
===================================================================
RCS file: /cvs/src/src/include/libiberty.h,v
retrieving revision 1.23
diff -c -r1.23 libiberty.h
*** include/libiberty.h	27 Feb 2003 21:01:01 -0000	1.23
--- include/libiberty.h	14 May 2003 19:42:49 -0000
***************
*** 254,265 ****
  
  #define _hex_array_size 256
  #define _hex_bad	99
! extern const char _hex_value[_hex_array_size];
  extern void hex_init PARAMS ((void));
  #define hex_p(c)	(hex_value (c) != _hex_bad)
  /* If you change this, note well: Some code relies on side effects in
     the argument being performed exactly once.  */
! #define hex_value(c)	(_hex_value[(unsigned char) (c)])
  
  /* Definitions used by the pexecute routine.  */
  
--- 254,265 ----
  
  #define _hex_array_size 256
  #define _hex_bad	99
! extern const unsigned char _hex_value[_hex_array_size];
  extern void hex_init PARAMS ((void));
  #define hex_p(c)	(hex_value (c) != _hex_bad)
  /* If you change this, note well: Some code relies on side effects in
     the argument being performed exactly once.  */
! #define hex_value(c)	((unsigned int) _hex_value[(unsigned char) (c)])
  
  /* Definitions used by the pexecute routine.  */
  
Index: libiberty/functions.texi
===================================================================
RCS file: /cvs/src/src/libiberty/functions.texi,v
retrieving revision 1.12
diff -c -r1.12 functions.texi
*** libiberty/functions.texi	16 Apr 2003 23:09:21 -0000	1.12
--- libiberty/functions.texi	14 May 2003 19:42:54 -0000
***************
*** 337,348 ****
  @end deftypefn
  
  @c hex.c:42
! @deftypefn Extension int hex_value (int @var{c})
  
  Returns the numeric equivalent of the given character when interpreted
  as a hexidecimal digit.  The result is undefined if you pass an
  invalid hex digit.  Note that the value you pass will be cast to
  @code{unsigned char} within the macro.
  
  @end deftypefn
  
--- 337,354 ----
  @end deftypefn
  
  @c hex.c:42
! @deftypefn Extension unsigned int hex_value (int @var{c})
  
  Returns the numeric equivalent of the given character when interpreted
  as a hexidecimal digit.  The result is undefined if you pass an
  invalid hex digit.  Note that the value you pass will be cast to
  @code{unsigned char} within the macro.
+ 
+ The @code{hex_value} macro returns @code{unsigned int}, rather than
+ signed @code{int}, to make it easier to use in parsing addresses from
+ hex dump files: a signed @code{int} would be sign-extended when
+ converted to a wider unsigned type --- like @code{bfd_vma}, on some
+ systems.
  
  @end deftypefn
  
Index: libiberty/hex.c
===================================================================
RCS file: /cvs/src/src/libiberty/hex.c,v
retrieving revision 1.3
diff -c -r1.3 hex.c
*** libiberty/hex.c	28 Mar 2002 04:06:38 -0000	1.3
--- libiberty/hex.c	14 May 2003 19:42:54 -0000
***************
*** 39,51 ****
  
  @end deftypefn
  
! @deftypefn Extension int hex_value (int @var{c})
  
  Returns the numeric equivalent of the given character when interpreted
  as a hexidecimal digit.  The result is undefined if you pass an
  invalid hex digit.  Note that the value you pass will be cast to
  @code{unsigned char} within the macro.
  
  @end deftypefn
  
  @undocumented _hex_array_size
--- 39,57 ----
  
  @end deftypefn
  
! @deftypefn Extension unsigned int hex_value (int @var{c})
  
  Returns the numeric equivalent of the given character when interpreted
  as a hexidecimal digit.  The result is undefined if you pass an
  invalid hex digit.  Note that the value you pass will be cast to
  @code{unsigned char} within the macro.
  
+ The @code{hex_value} macro returns @code{unsigned int}, rather than
+ signed @code{int}, to make it easier to use in parsing addresses from
+ hex dump files: a signed @code{int} would be sign-extended when
+ converted to a wider unsigned type --- like @code{bfd_vma}, on some
+ systems.
+ 
  @end deftypefn
  
  @undocumented _hex_array_size
***************
*** 60,66 ****
    && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21 \
    && EOF == -1
  
! const char _hex_value[_hex_array_size] =
  {
    _hex_bad, _hex_bad, _hex_bad, _hex_bad,   /* NUL SOH STX ETX */
    _hex_bad, _hex_bad, _hex_bad, _hex_bad,   /* EOT ENQ ACK BEL */
--- 66,72 ----
    && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21 \
    && EOF == -1
  
! const unsigned char _hex_value[_hex_array_size] =
  {
    _hex_bad, _hex_bad, _hex_bad, _hex_bad,   /* NUL SOH STX ETX */
    _hex_bad, _hex_bad, _hex_bad, _hex_bad,   /* EOT ENQ ACK BEL */
***************
*** 139,145 ****
  
  #else
  
! char _hex_value[_hex_array_size];
  
  #endif /* not ASCII */
  
--- 145,151 ----
  
  #else
  
! unsigned char _hex_value[_hex_array_size];
  
  #endif /* not ASCII */
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]