[Bug c/79507] New: Incorrect array item inlining when ASAN is enabled
zherczeg at inf dot u-szeged.hu
gcc-bugzilla@gcc.gnu.org
Tue Feb 14 13:10:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79507
Bug ID: 79507
Summary: Incorrect array item inlining when ASAN is enabled
Product: gcc
Version: lto
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: zherczeg at inf dot u-szeged.hu
Target Milestone: ---
Hi,
Tl;DR inlined address of a static array item is invalid.
GCC version: gcc-5 (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904
First, you need a jerryscript:
https://github.com/jerryscript-project/jerryscript
Checkout commit 66683e5d4b3d9474e86900b86be29105524b740c
Compile (note: LTO enabled):
tools/build.py --clean --compile-flag="-fsanitize=address -m32
-fno-omit-frame-pointer -fno-common -g" --linker-flag=-fsanitize=address
--jerry-libc=off --static-link=off --strip=off --system-allocator=on
Put the following into a test file (e.g. test.js):
''.replace(/^/g, 'b')
build/bin/jerry test.js
Result: ASAN error
What happens?
There is a function in lit-magic-strings.c which returns with a string based on
an ID. The strings are stored in a static const array.
const lit_utf8_byte_t *
lit_get_magic_string_utf8 (lit_magic_string_id_t id) /**< magic string id */
{
static const lit_utf8_byte_t * const lit_magic_strings[] JERRY_CONST_DATA =
{
#define LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE(size, id)
#define LIT_MAGIC_STRING_DEF(id, utf8_string) \
(const lit_utf8_byte_t *) utf8_string,
#include "lit-magic-strings.inc.h"
#undef LIT_MAGIC_STRING_DEF
#undef LIT_MAGIC_STRING_FIRST_STRING_WITH_SIZE
};
JERRY_ASSERT (id < LIT_MAGIC_STRING__COUNT);
return lit_magic_strings[id];
} /* lit_get_magic_string_utf8 */
The compiler tries to inline this function, which is obviously clever.
In ecma_regexp_exec_helper (ecma-regexp-object.c) you can find the following
code:
ECMA_STRING_TO_UTF8_STRING (input_string_p, input_buffer_p,
input_buffer_size);
if (input_buffer_size == 0u)
{
input_curr_p = lit_get_magic_string_utf8 (LIT_MAGIC_STRING__EMPTY);
}
else
{
input_curr_p = input_buffer_p;
}
In case of the example above, the string is empty, so input_buffer_size == 0,
and input_curr_p is loaded by the following instruction:
0x08060528 <ecma_regexp_exec_helper+333>: mov $0x80ba160,%esi
0x0806052d <ecma_regexp_exec_helper+338>: mov %eax,%edx (not relevant,
instruction scheduling)
0x0806052f <ecma_regexp_exec_helper+340>: mov %eax,%edi (not relevant,
instruction scheduling)
0x08060531 <ecma_regexp_exec_helper+342>: test %ecx,%ecx
0x08060533 <ecma_regexp_exec_helper+344>: cmovne -0x188(%ebp),%esi
So input_curr_p receives the 0x80ba160 value. This value MUST be the same as
input_buffer_p, but they are not when these compiler options are used. The
ecma_string_raw_chars function calls lit_get_magic_string_utf8 but with an
indirect id.
0x080761c8 <ecma_string_raw_chars+466>: lea 0x80b1700(,%edx,4),%edi
(gdb) x 0x80b1700
0x80b1700 <lit_magic_strings.3362.9502>: 0x080ad940
As you can see the first item (LIT_MAGIC_STRING__EMPTY is equals to 0) of the
array is 0x080ad940.
Because 0x80ba160 != 0x080ad940 the code crashes later.
More information about the Gcc-bugs
mailing list