Created attachment 42013 [details] memset_test Compiling the attached source without memset trivial implementation, Failed by undefined reference to `memset' OPTFLAGS = -Os -g -mabi=aapcs -fno-function-sections -Wall -mfloat-abi=soft -mtune=cortex-a9 Succeeded with option OPTFLAGS = -Os -g -mabi=aapcs -fno-function-sections -Wall -mfloat-abi=soft -mtune=cortex-a12 Using -O2 instead of -Os (Optimization level) also fix this fail. What is different optimization behavior(implementation) in GCC between cortex-a9 and cortex-a12 -given by mcpu or mtune option ? Found related issue in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888.
>What is different optimization behavior(implementation) in GCC between cortex-a9 and cortex-a12 -given by mcpu or mtune option ? Different tuning. Though maybe at -Os should be almost the same except for the allowance for using the instructions that are in cortex-a12 rather than a9 (for the -mcpu case). But really memset is part of the C standard here and you don't use -fno-hoisting option; though IIRC that still requires memset being included in your libc.
(In reply to dongkyun.s from comment #0) > Found related issue in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888. Unrelated bug report.
Not a bug, see PR 63393 comment #5 for explanation of why. *** This bug has been marked as a duplicate of bug 63393 ***
Created attachment 42014 [details] memset_test_cortex-a9.o (made by '-Os -mtune=cortex-a12') ./gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump -d memset_test_cortex-a12.o memset_test_cortex-a12.o: file format elf32-littlearm Disassembly of section .text: 00000000 <func1>: 0: b530 push {r4, r5, lr} 2: f1a0 0208 sub.w r2, r0, #8 6: 460c mov r4, r1 8: 2300 movs r3, #0 a: 2000 movs r0, #0 c: 2100 movs r1, #0 e: 42a3 cmp r3, r4 10: db00 blt.n 14 <func1+0x14> 12: bd30 pop {r4, r5, pc} 14: f852 5f08 ldr.w r5, [r2, #8]! 18: 3301 adds r3, #1 1a: 1940 adds r0, r0, r5 1c: eb41 71e5 adc.w r1, r1, r5, asr #31 20: e7f5 b.n e <func1+0xe> 00000022 <test_func>: 22: b51f push {r0, r1, r2, r3, r4, lr} 24: 2200 movs r2, #0 26: 490c ldr r1, [pc, #48] ; (58 <test_func+0x36>) 28: 2300 movs r3, #0 2a: e9cd 2300 strd r2, r3, [sp] 2e: e9cd 2302 strd r2, r3, [sp, #8] 32: 780c ldrb r4, [r1, #0] 34: 7908 ldrb r0, [r1, #4] 36: 4623 mov r3, r4 38: 4302 orrs r2, r0 3a: e9cd 2300 strd r2, r3, [sp] 3e: 788a ldrb r2, [r1, #2] 40: 2300 movs r3, #0 42: 4668 mov r0, sp 44: f043 0307 orr.w r3, r3, #7 48: 2102 movs r1, #2 4a: e9cd 2302 strd r2, r3, [sp, #8] 4e: f7ff fffe bl 0 <func1> 52: b004 add sp, #16 54: bd10 pop {r4, pc} 56: bf00 nop 58: 00000000 .word 0x00000000
Created attachment 42016 [details] obj made by '-Os -mtune=cortex-a9' ./gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump -d memset_test_cortex-a9.o memset_test_cortex-a9.o: file format elf32-littlearm Disassembly of section .text: 00000000 <func1>: 0: b530 push {r4, r5, lr} 2: f1a0 0208 sub.w r2, r0, #8 6: 460c mov r4, r1 8: 2300 movs r3, #0 a: 2000 movs r0, #0 c: 2100 movs r1, #0 e: 42a3 cmp r3, r4 10: db00 blt.n 14 <func1+0x14> 12: bd30 pop {r4, r5, pc} 14: f852 5f08 ldr.w r5, [r2, #8]! 18: 3301 adds r3, #1 1a: 1940 adds r0, r0, r5 1c: eb41 71e5 adc.w r1, r1, r5, asr #31 20: e7f5 b.n e <func1+0xe> 00000022 <test_func>: 22: b51f push {r0, r1, r2, r3, r4, lr} 24: 2210 movs r2, #16 26: 2100 movs r1, #0 28: 4668 mov r0, sp 2a: f7ff fffe bl 0 <memset> 2e: 490a ldr r1, [pc, #40] ; (58 <test_func+0x36>) 30: 2200 movs r2, #0 32: 780c ldrb r4, [r1, #0] 34: 7908 ldrb r0, [r1, #4] 36: 4623 mov r3, r4 38: 4302 orrs r2, r0 3a: 4668 mov r0, sp 3c: e9cd 2300 strd r2, r3, [sp] 40: 2300 movs r3, #0 42: 788a ldrb r2, [r1, #2] 44: f043 0307 orr.w r3, r3, #7 48: 2102 movs r1, #2 4a: e9cd 2302 strd r2, r3, [sp, #8] 4e: f7ff fffe bl 0 <func1> 52: b004 add sp, #16 54: bd10 pop {r4, pc} 56: bf00 nop 58: 00000000 .word 0x00000000
Created attachment 42017 [details] obj made by '-Os -mtune=cortex-a12' ./gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump -d memset_test_cortex-a12.o memset_test_cortex-a12.o: file format elf32-littlearm Disassembly of section .text: 00000000 <func1>: 0: b530 push {r4, r5, lr} 2: f1a0 0208 sub.w r2, r0, #8 6: 460c mov r4, r1 8: 2300 movs r3, #0 a: 2000 movs r0, #0 c: 2100 movs r1, #0 e: 42a3 cmp r3, r4 10: db00 blt.n 14 <func1+0x14> 12: bd30 pop {r4, r5, pc} 14: f852 5f08 ldr.w r5, [r2, #8]! 18: 3301 adds r3, #1 1a: 1940 adds r0, r0, r5 1c: eb41 71e5 adc.w r1, r1, r5, asr #31 20: e7f5 b.n e <func1+0xe> 00000022 <test_func>: 22: b51f push {r0, r1, r2, r3, r4, lr} 24: 2200 movs r2, #0 26: 490c ldr r1, [pc, #48] ; (58 <test_func+0x36>) 28: 2300 movs r3, #0 2a: e9cd 2300 strd r2, r3, [sp] 2e: e9cd 2302 strd r2, r3, [sp, #8] 32: 780c ldrb r4, [r1, #0] 34: 7908 ldrb r0, [r1, #4] 36: 4623 mov r3, r4 38: 4302 orrs r2, r0 3a: e9cd 2300 strd r2, r3, [sp] 3e: 788a ldrb r2, [r1, #2] 40: 2300 movs r3, #0 42: 4668 mov r0, sp 44: f043 0307 orr.w r3, r3, #7 48: 2102 movs r1, #2 4a: e9cd 2302 strd r2, r3, [sp, #8] 4e: f7ff fffe bl 0 <func1> 52: b004 add sp, #16 54: bd10 pop {r4, pc} 56: bf00 nop 58: 00000000 .word 0x00000000
> Different tuning. Though maybe at -Os should be almost the same except for the allowance for using the instructions that are in cortex-a12 rather than a9 (for the -mcpu case). I attached .o files made by '-mtune=cortex-a9' and 'mtune=cortex-a12' (same as -mcpu case). Could you describe more in detail about this why memset is added on cortex-a9 or below ? memset_test_cortex-a9.o: file format elf32-littlearm Disassembly of section .text: ... 00000022 <test_func>: 22: b51f push {r0, r1, r2, r3, r4, lr} 24: 2210 movs r2, #16 26: 2100 movs r1, #0 28: 4668 mov r0, sp 2a: f7ff fffe bl 0 <memset> > But really memset is part of the C standard here and you don't use -fno-hoisting option; Which option do you mean? (I'm sorry, but, fno-hoisting is not found) > Not a bug, see PR 63393 comment #5 for explanation of why. This is not related to freestanding implementations. Again, option is different by '-mcpu or -mtune' only. (1) CFLAGS: -Os -mtune=cortex-a9 (CC) memset_test.o (CC) main.o gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/arm-linux-gnueabi-ld -Bstatic -o memset_test \ memset_test.o main.o \ --start-group -L/home/dongkyun.s/tmp/memset_test/gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/6.3.1 -lgcc --end-group -Map memset_test.map #--gc-sections memset_test.c:(.text+0x2a): undefined reference to `memset' (2) CFLAGS: -Os -mtune=cortex-a12 (CC) memset_test.o (CC) main.o gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/arm-linux-gnueabi-ld -Bstatic -o memset_test \ memset_test.o main.o \ --start-group -L/home/dongkyun.s/tmp/memset_test/gcc-linaro-6.3.1-2017.02-x86_64_arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/6.3.1 -lgcc --end-group -Map memset_test.map #--gc-sections BUILD_TARGETS=memset_test.bin memset_test.txt memset_test.dis memset_test.ver Build Done!
>This is not related to freestanding implementations. Huh? Since you are not linking against the C library, it has to be. or you mean this should be optimized not to use memset; different question from what your summary is about.
> or you mean this should be optimized not to use memset; different question from what your summary is about. I mean -ffreestanding or -fno-freestanding are not included in this testcase, but, mtune/ mcpu option. Thanks!
(In reply to dongkyun.s from comment #9) > I mean -ffreestanding or -fno-freestanding are not included in this > testcase, but, mtune/ mcpu option. Yes but your summary was saying memset was missing which is not correct and would cause this bug report to be invalid. But in reality you are complaining that the memset was not needed in the first place why is it being used for -mtune=cortex-a9 when doing -mtune=cortex-a12 can get away with not needing memset. Two different issues :).
Dear pinskia@gcc.gnu.org, Thanks for correcting title to "memset called when it does not need to be; -mtune=cortex-a9" along with the comment :)
Confirmed the call on 6.4.1 but GCC 7 and trunk don't generate the call for -mcpu=cortex-a9 . I don't know off the top of my head what change fixed this though.
> Confirmed the call on 6.4.1 but GCC 7 and trunk don't generate the call for -mcpu=cortex-a9 . I also verified memset call is not generated with GCC 7.1 + "-mcpu=cortex-a9 or -mtune=cortex-a9" or lower. It seems interesting that in GCC6, - don't generate the memset call for -mcpu=cortex-a12 or higer(e.g, cortex-a15, V7 big.LITTLE) - always generate the memset call for -mcpu=cortex-a9 or lower(e.g, cortex-a7, cotex-a5) in GCC7.1 - always don't generate the memset call (even with V3 Architecture Processors. e.g, -mcpu=arm7)
(In reply to dongkyun.s from comment #13) > > Confirmed the call on 6.4.1 but GCC 7 and trunk don't generate the call for -mcpu=cortex-a9 . > > I also verified memset call is not generated with GCC 7.1 + "-mcpu=cortex-a9 > or -mtune=cortex-a9" or lower. > > It seems interesting that > in GCC6, > - don't generate the memset call for -mcpu=cortex-a12 or higer(e.g, > cortex-a15, V7 big.LITTLE) > - always generate the memset call for -mcpu=cortex-a9 or lower(e.g, > cortex-a7, cotex-a5) > > in GCC7.1 > - always don't generate the memset call (even with V3 Architecture > Processors. e.g, -mcpu=arm7) There's nothing in the compiler that explicitly says: use memset for these cores and not for others. The choice will be down to available instructions and their relative costs.
> There's nothing in the compiler that explicitly says: use memset for these cores and not for others. The choice will be down to available instructions and their relative costs. Agreed, but, I'm just wondering why it has diffrent behavior according by GCC version with -Os. (It should be same result if the choice is made by their instructions and costs)
> Agreed, but, I'm just wondering why it has diffrent behavior according by > GCC version with -Os. (It should be same result if the choice is made by > their instructions and costs) I think that for this example GCC 7 generates memset() call after changes in tree-ssa-dse https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=244442 (tuning is the same) for reduced test: $ cat memset_test_reduced.c long long func1( long long *pl) { long long r = 0; for(int i=0; i<2; i++) r += ( long long ) pl[i]; return r; } long long test_func(void) { long long x[2] = {0}; x[0] = 3; x[1] = 4; return func1(x); } compiled with: gcc -S memset_test_reduced.c -g -mabi=aapcs -fno-function-sections -Wall -mfloat-abi=soft -Os -mtune=cortex-a9 -fdump-tree-all Difference between GIMPLE produced by gcc-6.4.1 and gcc-7.2.1 is that gcc-7 optimized out "x = {};" in first DSE pass: diff -u 6.4.1/memset_test_reduced.c.210t.optimized 7.2.1/memset_test_reduced.c.227t.optimized ... test_func () { long long int x[2]; - long long int _5; + long long int _4; - <bb 2>: ============== - x = {}; ============== + <bb 2> [100.00%]: x[0] = 3; x[1] = 4; - _5 = func1 (&x); + _4 = func1 (&x); x ={v} {CLOBBER}; - return _5; + return _4; } Gcc-6 keeps it and transforms in memset() (according to tune options?) after all.
> I think that for this example GCC 7 generates memset() call after changes in > tree-ssa-dse I mean GCC 6 generates memset()
Dear Michail, Your analysis was very helpful. I've also verified that compiler may insert memset() call or not according by 1) DSE optimization - object size/base and tune options 2) code generation.