[Bug c/79507] New: Incorrect array item inlining when ASAN is enabled

zherczeg at inf dot u-szeged.hu gcc-bugzilla@gcc.gnu.org
Tue Feb 14 13:10:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79507

            Bug ID: 79507
           Summary: Incorrect array item inlining when ASAN is enabled
           Product: gcc
           Version: lto
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zherczeg at inf dot u-szeged.hu
  Target Milestone: ---

Hi,

Tl;DR inlined address of a static array item is invalid.

GCC version: gcc-5 (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904

First, you need a jerryscript:
https://github.com/jerryscript-project/jerryscript

Checkout commit 66683e5d4b3d9474e86900b86be29105524b740c

Compile (note: LTO enabled):

tools/build.py --clean --compile-flag="-fsanitize=address -m32 
-fno-omit-frame-pointer -fno-common -g" --linker-flag=-fsanitize=address 
--jerry-libc=off --static-link=off --strip=off --system-allocator=on

Put the following into a test file (e.g. test.js):

''.replace(/^/g, 'b')

build/bin/jerry test.js

Result: ASAN error

What happens?

There is a function in lit-magic-strings.c which returns with a string based on
an ID. The strings are stored in a static const array.

const lit_utf8_byte_t *
lit_get_magic_string_utf8 (lit_magic_string_id_t id) /**< magic string id */
{
  static const lit_utf8_byte_t * const lit_magic_strings[] JERRY_CONST_DATA =
  {
#define LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE(size, id)
#define LIT_MAGIC_STRING_DEF(id, utf8_string) \
    (const lit_utf8_byte_t *) utf8_string,
#include "lit-magic-strings.inc.h"
#undef LIT_MAGIC_STRING_DEF
#undef LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE
  };

  JERRY_ASSERT (id < LIT_MAGIC_STRING__COUNT);

  return lit_magic_strings[id];
} /* lit_get_magic_string_utf8 */

The compiler tries to inline this function, which is obviously clever.

In ecma_regexp_exec_helper (ecma-regexp-object.c) you can find the following
code:

  ECMA_STRING_TO_UTF8_STRING (input_string_p, input_buffer_p,
input_buffer_size);

  if (input_buffer_size == 0u)
  {
    input_curr_p = lit_get_magic_string_utf8 (LIT_MAGIC_STRING__EMPTY);
  }
  else
  {
    input_curr_p = input_buffer_p;
  }

In case of the example above, the string is empty, so input_buffer_size == 0,
and input_curr_p is loaded by the following instruction:

   0x08060528 <ecma_regexp_exec_helper+333>:    mov    $0x80ba160,%esi
   0x0806052d <ecma_regexp_exec_helper+338>:    mov    %eax,%edx (not relevant,
instruction scheduling)
   0x0806052f <ecma_regexp_exec_helper+340>:    mov    %eax,%edi (not relevant,
instruction scheduling)
   0x08060531 <ecma_regexp_exec_helper+342>:    test   %ecx,%ecx
   0x08060533 <ecma_regexp_exec_helper+344>:    cmovne -0x188(%ebp),%esi

So input_curr_p receives the 0x80ba160 value. This value MUST be the same as
input_buffer_p, but they are not when these compiler options are used. The
ecma_string_raw_chars function calls lit_get_magic_string_utf8 but with an
indirect id.

0x080761c8 <ecma_string_raw_chars+466>:      lea    0x80b1700(,%edx,4),%edi

(gdb) x 0x80b1700
0x80b1700 <lit_magic_strings.3362.9502>:        0x080ad940

As you can see the first item (LIT_MAGIC_STRING__EMPTY is equals to 0) of the
array is 0x080ad940.

Because 0x80ba160 != 0x080ad940 the code crashes later.


More information about the Gcc-bugs mailing list