This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: How to control GCC builtin functions optimization
- From: Cao jin <caoj dot fnst at cn dot fujitsu dot com>
- To: <gcc-help at gcc dot gnu dot org>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Martin Liška <mliska at suse dot cz>
- Date: Fri, 11 Jan 2019 19:35:50 +0800
- Subject: Re: How to control GCC builtin functions optimization
- References: <38b466d6-f905-0fb8-e9aa-04c7aac5e3d3@cn.fujitsu.com>
Hi,
Please bear my elaboration of the problem.
There is string.h has:
#define memcpy(d,s,l) __builtin_memcpy(d,s,l)
#define memset(d,c,l) __builtin_memset(d,c,l)
#define memcmp __builtin_memcmp
and string.c has function implementation of them.
In linux kernel, both arch/x86/boot/compressed/pgtable_64.c and
arch/x86/boot/compressed/kaslr.c include string.h, and both .c use
memcpy. But, from nm output of both .o, kaslr.o has memcpy entry, while
pgtable_64.o doesn't. Apparently, for __builtin_memcpy, sometimes is
optimized to inline code, and sometimes emit a call to local memcpy().
So the answer seems is: GCC has heuristic decision on how to expand
the __builtin_memcpy, but the result is not fixed, it is case by case.
And -mstringop-strategy=byte_loop helped me to confirm we can control
the optimization behaviour.
Thank you both, Jakub and Martin! This free me with headache of these
two days:)
--
Sincerely,
Cao jin
On 1/11/19 11:03 AM, Cao jin wrote:
> Hi,
> (pls CC me when replying because I am not subscriber)
>
> I met an interesting phenomenon when looking into linux kernel
> compilation, it can be simply summarized as following: in
> arch/x86/boot/compressed, memcpy is defined as __builtin_memcpy, while
> also implemented as a function. But when using memcpy, in some case GCC
> optimize it to inline code, in other case GCC just emit a call to
> self-defined memcpy function. This can be confirmed according to the
> symbol table via `nm bluh.o`.
>
> The compiling flags is, for example:
> cmd_arch/x86/boot/compressed/pgtable_64.o := gcc
> -Wp,-MD,arch/x86/boot/compressed/.pgtable_64.o.d -nostdinc -isystem
> /usr/lib/gcc/x86_64-redhat-linux/8/include -I./arch/x86/include
> -I./arch/x86/include/gene rated -I./include
> -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi
> -I./include/uapi -I./include/generated/uapi -include
> ./include/linux/kconfig.h -include ./include/linux/compiler_types.h
> -D__KERNEL__ -DCONFIG_CC_STACKPROTECTOR -m64 -O2 -fno-strict-aliasing
> -fPIE -DDISABLE_BRANCH_PROFILING -mcmodel=small -mno-mmx -mno-sse
> -ffreestanding -fno-stack-protector -DKBUILD_BASENAME='"pgtable_64"'
> -DKBUILD_MODNAME='"pgtable_64"' -c -o
> arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/pgtable_64.c
>
> Now the questions is: from code-reading, it is kind of non-intuitive, is
> there any explicit way to control the optimization behavior accurately?
>