Created attachment 38255 [details] qla_attr.i.gz The linux kernel has a new tool named "objtool" which follows all possible code paths for every .o file, looking for abnormalities. In one rare case it has discovered a corrupt truncated function. From the disassembly of drivers/scsi/qla2xxx/qla_attr.o: 0000000000002f53 <qla2x00_get_host_fabric_name>: 2f53: 55 push %rbp 2f54: 48 89 e5 mov %rsp,%rbp 0000000000002f57 <qla2x00_get_fc_host_stats>: 2f57: 55 push %rbp 2f58: b9 e8 00 00 00 mov $0xe8,%ecx 2f5d: 48 89 e5 mov %rsp,%rbp ... Note that qla2x00_get_host_fabric_name() is inexplicably truncated after setting up the frame pointer. It falls through to the next function, which is very bad. I can recreate it with gcc 5.3.1 or gcc 6.0 on the upstream Linux kernel at tag v4.6-rc3. The call chain which appears to trigger the problem is: qla2x00_get_host_fabric_name() wwn_to_u64() get_unaligned_be64() be64_to_cpup() __be64_to_cpup() It occurs with the combination of the following two recent Linux commits: - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations") - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") The gzipped .i file is attached. I'll also attach the kernel .config file.
Created attachment 38256 [details] Linux kernel config
$ gcc -Wp,-MD,drivers/scsi/qla2xxx/.qla_attr.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/5.3.1/include -I./arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Os -Wno-maybe-uninitialized --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(qla_attr)" -D"KBUILD_MODNAME=KBUILD_STR(qla2xxx)" -c -o drivers/scsi/qla2xxx/qla_attr.o drivers/scsi/qla2xxx/qla_attr.c $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-default-libstdcxx-abi=gcc4-compatible --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)
I can reproduce it with: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC) No fancy compiler flags are necessary to thigger it. Without "-fno-omit-frame-pointer", function loses its two remaining insns, I see an empty body: .type qla2x00_get_host_fabric_name, @function qla2x00_get_host_fabric_name: .LFB4504: .cfi_startproc .cfi_endproc .LFE4504: .size qla2x00_get_host_fabric_name, .-qla2x00_get_host_fabric_name Simple "gcc -Os qla_attr.i.c -S" would do. gcc -O2 produces a normally-looking function.
Shorter reproducer: typedef __signed__ char __s8; typedef unsigned char __u8; typedef __signed__ short __s16; typedef unsigned short __u16; typedef __signed__ int __s32; typedef unsigned int __u32; __extension__ typedef __signed__ long long __s64; __extension__ typedef unsigned long long __u64; typedef signed char s8; typedef unsigned char u8; typedef signed short s16; typedef unsigned short u16; typedef signed int s32; typedef unsigned int u32; typedef signed long long s64; typedef unsigned long long u64; typedef __u64 __be64; static inline __attribute__((no_instrument_function)) __attribute__((__const__)) __u64 __fswab64(__u64 val) { return __builtin_bswap64(val); } static inline __attribute__((no_instrument_function)) __attribute__((always_inline)) __u64 __swab64p(const __u64 *p) { return (__builtin_constant_p((__u64)(*p)) ? ((__u64)( (((__u64)(*p) & (__u64)0x00000000000000ffULL) << 56) | (((__u64)(*p) & (__u64)0x000000000000ff00ULL) << 40) | (((__u64)(*p) & (__u64)0x0000000000ff0000ULL) << 24) | (((__u64)(*p) & (__u64)0x00000000ff000000ULL) << 8) | (((__u64)(*p) & (__u64)0x000000ff00000000ULL) >> 8) | (((__u64)(*p) & (__u64)0x0000ff0000000000ULL) >> 24) | (((__u64)(*p) & (__u64)0x00ff000000000000ULL) >> 40) | (((__u64)(*p) & (__u64)0xff00000000000000ULL) >> 56))) : __fswab64(*p)); } static inline __attribute__((no_instrument_function)) __attribute__((always_inline)) __u64 __be64_to_cpup(const __be64 *p) { return __swab64p((__u64 *)p); } static inline __attribute__((no_instrument_function)) __attribute__((always_inline)) u64 get_unaligned_be64(const void *p) { return __be64_to_cpup((__be64 *)p); } static inline __attribute__((no_instrument_function)) u64 wwn_to_u64(u8 *wwn) { return get_unaligned_be64(wwn); } struct Scsi_Host { unsigned long base; unsigned long io_port; unsigned char n_io_port; unsigned char dma_channel; unsigned int irq; void *shost_data; unsigned long hostdata[0] __attribute__ ((aligned (sizeof(unsigned long)))); }; static inline __attribute__((no_instrument_function)) void *shost_priv(struct Scsi_Host *shost) { return (void *)shost->hostdata; } typedef struct scsi_qla_host { u8 fabric_node_name[8]; u32 device_flags; } scsi_qla_host_t; struct fc_host_attrs { u64 node_name; u64 port_name; u64 permanent_port_name; u32 supported_classes; u8 supported_fc4s[32]; u32 supported_speeds; u32 maxframe_size; u16 max_npiv_vports; char serial_number[80]; char manufacturer[80]; char model[256]; char model_description[256]; char hardware_version[64]; char driver_version[64]; char firmware_version[64]; char optionrom_version[64]; u32 port_id; u8 active_fc4s[32]; u32 speed; u64 fabric_name; }; static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost) { scsi_qla_host_t *vha = shost_priv(shost); u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}; u64 fabric_name = wwn_to_u64(node_name); if (vha->device_flags & 0x1) fabric_name = wwn_to_u64(vha->fabric_node_name); (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name; } void *get_host_fabric_name = qla2x00_get_host_fabric_name;
Even smaller reproducer. Bug disappears if "__attribute__((always_inline))" is removed everywhere. typedef unsigned char u8; typedef unsigned int u32; typedef unsigned long long u64; static inline __attribute__((__const__)) u64 __fswab64(u64 val) { return __builtin_bswap64(val); } static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p) { return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __fswab64(*p)); } static inline __attribute__((always_inline)) u64 __be64_to_cpup(const u64 *p) { return __swab64p((u64 *)p); } static inline __attribute__((always_inline)) u64 get_unaligned_be64(const void *p) { return __be64_to_cpup((u64 *)p); } static inline u64 wwn_to_u64(u8 *wwn) { return get_unaligned_be64(wwn); } struct Scsi_Host { void *shost_data; unsigned long hostdata[0]; }; static inline void *shost_priv(struct Scsi_Host *shost) { return (void *)shost->hostdata; } typedef struct scsi_qla_host { u8 fabric_node_name[8]; u32 device_flags; } scsi_qla_host_t; struct fc_host_attrs { u64 fabric_name; }; static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost) { scsi_qla_host_t *vha = shost_priv(shost); u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}; u64 fabric_name = wwn_to_u64(node_name); if (vha->device_flags & 0x1) fabric_name = wwn_to_u64(vha->fabric_node_name); (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name; } void *get_host_fabric_name = qla2x00_get_host_fabric_name;
I can collapse the chain of inlines down to this and still see the bug. Removing "__attribute__((always_inline))", or merging __swab64p() and wwn_to_u64(), makes bug disappear. typedef unsigned char u8; typedef unsigned int u32; typedef unsigned long long u64; static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p) { return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __builtin_bswap64(*p)); } static inline u64 wwn_to_u64(void *wwn) { return __swab64p(wwn); } struct Scsi_Host { void *shost_data; unsigned long hostdata[0]; }; static inline void *shost_priv(struct Scsi_Host *shost) { return (void *)shost->hostdata; } typedef struct scsi_qla_host { u8 fabric_node_name[8]; u32 device_flags; } scsi_qla_host_t; struct fc_host_attrs { u64 fabric_name; }; static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost) { scsi_qla_host_t *vha = shost_priv(shost); u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}; u64 fabric_name = wwn_to_u64(node_name); if (vha->device_flags & 0x1) fabric_name = wwn_to_u64(vha->fabric_node_name); (((struct fc_host_attrs *)(shost)->shost_data)->fabric_name) = fabric_name; } void *get_host_fabric_name = qla2x00_get_host_fabric_name;
Following code aborts on x86_64 4.9.2 and 5.3.0 at -O2, at least: #pragma GCC optimize("no-unit-at-a-time") typedef unsigned char u8; typedef unsigned long long u64; static inline __attribute__((always_inline)) u64 __swab64p(const u64 *p) { return (__builtin_constant_p((u64)(*p)) ? ((u64)( (((u64)(*p) & (u64)0x00000000000000ffULL) << 56) | (((u64)(*p) & (u64)0x000000000000ff00ULL) << 40) | (((u64)(*p) & (u64)0x0000000000ff0000ULL) << 24) | (((u64)(*p) & (u64)0x00000000ff000000ULL) << 8) | (((u64)(*p) & (u64)0x000000ff00000000ULL) >> 8) | (((u64)(*p) & (u64)0x0000ff0000000000ULL) >> 24) | (((u64)(*p) & (u64)0x00ff000000000000ULL) >> 40) | (((u64)(*p) & (u64)0xff00000000000000ULL) >> 56))) : __builtin_bswap64(*p)); } static inline u64 wwn_to_u64(void *wwn) { return __swab64p(wwn); } void __attribute__((noinline,noclone)) broken(u64* shost) { u8 node_name[8] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}; *shost = wwn_to_u64(node_name); } void __attribute__((noinline,noclone)) dummy(void) { __builtin_abort(); } int main(int argc, char* argv[]) { u64 v; broken(&v); if(v != (u64)-1) __builtin_abort(); return 0; }
Confirmed. ;; Function broken (broken, funcdef_no=2, decl_uid=1764, cgraph_uid=2, symbol_order=2) (executed once) __attribute__((noclone, noinline)) broken (u64 * shost) { <bb 2>: __builtin_unreachable (); } after inlining we see: <bb 2>: node_name[0] = 255; node_name[1] = 255; node_name[2] = 255; node_name[3] = 255; node_name[4] = 255; node_name[5] = 255; node_name[6] = 255; node_name[7] = 255; _14 = MEM[(const u64 *)&node_name]; _15 = __builtin_constant_p (_14); if (_15 != 0) goto <bb 3>; else goto <bb 4>; ... <bb 4>: iftmp.0_37 = _39(D); __builtin_unreachable (); which seems to be introduced by IPA CP: __builtin_unreachable/9 (__builtin_unreachable) @0x7f1563274cf0 Type: function Visibility: external public References: Referring: Availability: not_available First run: 0 Function flags: Called by: wwn_to_u64.constprop.0/8 Calls: wwn_to_u64.constprop.0/8 (wwn_to_u64.constprop) @0x7f1563274000 Type: function definition analyzed Visibility: References: Referring: Function wwn_to_u64.constprop/8 is inline copy in broken/2 Clone of wwn_to_u64/1 Availability: local First run: 0 Function flags: local Called by: broken/2 (inlined) (1.00 per call) Calls: __builtin_constant_p/5 (1.00 per call) __builtin_unreachable/9
Note that the testcase is strictly invalid as it has 8-byte aligned node_name[] passed to wwn_to_u64 which ends up performing a 64bit load on it via __swab64p.
Fixing that with void __attribute__((noinline,noclone)) broken(u64* shost) { u8 node_name[8] __attribute__((aligned(__alignof(u64)))) = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}; *shost = wwn_to_u64(node_name); } still shows broken code. Note __swab64p is a quite stupid implementation assuming GCC can't constant-fold __builtin_bswap64.
Happens via (gdb) bt #0 redirect_to_unreachable (e=0x7ffff6891820) at /space/rguenther/src/svn/trunk3/gcc/ipa-inline-analysis.c:735 #1 0x0000000000bc9325 in edge_set_predicate (e=0x7ffff6891820, predicate=0x7fffffffc980) at /space/rguenther/src/svn/trunk3/gcc/ipa-inline-analysis.c:768 #2 0x0000000000bd1df7 in remap_edge_summaries (inlined_edge=0x7ffff68912d8, node=<cgraph_node* 0x7ffff69b1000 "wwn_to_u64.constprop">, info=0x7ffff6994ba0, callee_info=0x7ffff6994c60, operand_map=..., offset_map=..., possible_truths=0, toplev_predicate=0x7fffffffca30) at /space/rguenther/src/svn/trunk3/gcc/ipa-inline-analysis.c:3500 #3 0x0000000000bd2603 in inline_merge_summary (edge=0x7ffff68912d8) at /space/rguenther/src/svn/trunk3/gcc/ipa-inline-analysis.c:3647 #4 0x00000000017f2b84 in inline_call (e=0x7ffff68912d8, update_original=true, Python Exception <class 'gdb.error'> There is no member or method named m_vecpfx.: new_edges=0x7fffffffd9d0, overall_size=0x25c63e0 <_ZL12overall_size>, update_overall_summary=true, callee_removed=0x0) at /space/rguenther/src/svn/trunk3/gcc/ipa-inline-transform.c:370 #5 0x00000000017e8953 in inline_small_functions () at /space/rguenther/src/svn/trunk3/gcc/ipa-inline.c:2022 #6 0x00000000017e9ca7 in ipa_inline () so somehow it get's confused: BB 2 predicate:(true) _3 = MEM[(const u64 *)wwn_2(D)]; freq:1.00 size: 1 time: 1 50% will be eliminated by inlining Accounting size:0.50, time:0.50 on new predicate:(op0[ref offset: 0] changed) && (not inlined) Accounting size:0.50, time:0.50 on new predicate:(op0[ref offset: 0] changed) _4 = __builtin_constant_p (_3); freq:1.00 size: 0 time: 0 if (_4 != 0) freq:1.00 size: 2 time: 2 Accounting size:2.00, time:2.00 on predicate:(op0[ref offset: 0] changed) BB 4 predicate:(op0[ref offset: 0] not constant) iftmp.0_26 = __builtin_bswap64 (_3); freq:0.61 size: 1 time: 1 BB 3 predicate:(true) _5 = _3 << 56; (BB 3 predicate shouldn't be true) Honza? This seems to be somewhat fragile (redirecting things to unreachable but _not_ changing the actual predicates in the IL). Claiming the predicate is constant true is also a bit bogus (as can be seen in following optimization).
It was caused by r208831.
So if I understand correctly, some reachable code is incorrectly getting marked unreachable and then getting discarded. Interestingly, the function's epilogue (frame pointer restore) and return instruction are also getting discarded. Can you tell if that will always be the case when this bug triggers? If so, that should make it possible for objtool to reliably detect the bug.
(In reply to Josh Poimboeuf from comment #13) > So if I understand correctly, some reachable code is incorrectly getting > marked unreachable and then getting discarded. > > Interestingly, the function's epilogue (frame pointer restore) and return > instruction are also getting discarded. Can you tell if that will always be > the case when this bug triggers? I don't think we can rely on that. The path could simply fall thru to a random instruction which is still inside the function. Say, if it was if (x) ... else ... foo (); then the arm marked unreachable would simply disappear. > If so, that should make it possible for objtool to reliably detect the bug.
Even with the #c10 change it looks to be aliasing violation, does it work fine if you compile with -fno-strict-aliasing or use may_alias on the type through which it is read?
> Honza? This seems to be somewhat fragile (redirecting things to unreachable > but _not_ changing the actual predicates in the IL). Claiming the > predicate is constant true is also a bit bogus (as can be seen in following > optimization). At WPA stage inliner can not update IL, so it simply redirects to builtin_unreachable if predicate is FALSE and expects scalar passes to be monotonously smarter than IPA analysis. BB3 predicate true is conservatively correct: there is no predicate to represent that something is not constant because we assume scalar optimizers to possibly be stronger than us and decide better. Here we manage to assume that builtin_constant_p is true and thus we conclude the following BB is unreachable: BB 4 predicate:(op0[ref offset: 0] not constant) iftmp.0_26 = __builtin_bswap64 (_3); freq:0.61 size: 1 time: 1 This is because ipa-prop analyzes the values in node_name: Jump functions of caller broken/2: callsite broken/2 -> wwn_to_u64/1 : param 0: UNKNOWN Aggregate passed by reference: offset: 0, cst: 255 offset: 8, cst: 255 offset: 16, cst: 255 offset: 24, cst: 255 offset: 32, cst: 255 offset: 40, cst: 255 offset: 48, cst: 255 offset: 56, cst: 255 but later we fold: <bb 2>: node_name[0] = 255; node_name[1] = 255; node_name[2] = 255; node_name[3] = 255; node_name[4] = 255; node_name[5] = 255; node_name[6] = 255; node_name[7] = 255; _14 = MEM[(const u64 *)&node_name]; _15 = __builtin_constant_p (_14); into false (which is stupid, but conservatively correct modulo the assumption about IPA passes being consistently weakter than local passes). Making IPA passes to fold builtin_constant_p call into true is technically possible, if I add builtin_true and redirect to it, but it would not be very good if the local passes can not fold above (as the wrong arm is taken). So can we possibly fix the local passes? Honza
(In reply to Richard Biener from comment #14) > (In reply to Josh Poimboeuf from comment #13) > > Interestingly, the function's epilogue (frame pointer restore) and return > > instruction are also getting discarded. Can you tell if that will always be > > the case when this bug triggers? > > I don't think we can rely on that. The path could simply fall thru to > a random instruction which is still inside the function. Say, if it was > > if (x) > ... > else > ... > foo (); > > then the arm marked unreachable would simply disappear. I tried doing that by putting the offending wwn_to_u64() call in an if statement, but then the bug disappeared. In fact, adding *any* code before the call seems to make the bug disappear.
Jakub: There is indeed aliasing issue, but with -fno-strict-aliasing the bug is the same. Apparently this is ipa-prop bug, because ipa-prop does not track size of accesses and thus it does not know there is a mismatch. The value produced is thus not INT_MAX as intended but 255. This is Martin's area.
Josh: This is limitation of ipa-prop tracking. It very easily gives up on determinging constantness of aggregate parameter. Hope Martin will fix it next stage1. WIP patches was done few releases back but not merged so far.
Thanks very much to everyone who has looked into this so far. It would be very helpful to get answers to the following questions, so we can understand the impact to the kernel: 1) Is there a reliable way to avoid the bug, either in code or with a gcc flag? 2) Is there a reliable way to detect the bug by looking at the object code?
On April 15, 2016 3:31:33 PM GMT+02:00, "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > >--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> --- >> Honza? This seems to be somewhat fragile (redirecting things to >unreachable >> but _not_ changing the actual predicates in the IL). Claiming the >> predicate is constant true is also a bit bogus (as can be seen in >following >> optimization). > >At WPA stage inliner can not update IL, so it simply redirects to >builtin_unreachable if predicate is FALSE and expects scalar passes to >be >monotonously smarter than IPA analysis. > >BB3 predicate true is conservatively correct: there is no predicate to >represent that something is not constant because we assume scalar >optimizers to >possibly be stronger than us and decide better. > >Here we manage to assume that builtin_constant_p is true and thus we >conclude >the following BB is unreachable: >BB 4 predicate:(op0[ref offset: 0] not constant) > >iftmp.0_26 = __builtin_bswap64 (_3); > > freq:0.61 size: 1 time: 1 >This is because ipa-prop analyzes the values in node_name: > >Jump functions of caller broken/2: > >callsite broken/2 -> wwn_to_u64/1 : > >param 0: UNKNOWN > >Aggregate passed by reference: > offset: 0, cst: 255 > offset: 8, cst: 255 > offset: 16, cst: 255 > offset: 24, cst: 255 > offset: 32, cst: 255 > offset: 40, cst: 255 > offset: 48, cst: 255 > offset: 56, cst: 255 > >but later we fold: > ><bb 2>: > >node_name[0] = 255; > >node_name[1] = 255; > >node_name[2] = 255; > >node_name[3] = 255; > >node_name[4] = 255; > >node_name[5] = 255; > >node_name[6] = 255; > >node_name[7] = 255; > >_14 = MEM[(const u64 *)&node_name]; > >_15 = __builtin_constant_p (_14); > > >into false (which is stupid, but conservatively correct modulo the >assumption >about IPA passes being consistently weakter than local passes). > >Making IPA passes to fold builtin_constant_p call into true is >technically >possible, if I add builtin_true and redirect to it, but it would not be >very >good if the local passes can not fold above (as the wrong arm is >taken). > >So can we possibly fix the local passes?You can't rely on this happening, consider -fno-foo. So my original local hack was to not do this built-in constant stuff for is_refp Richard. >Honza
(In reply to Jan Hubicka from comment #18) > Jakub: There is indeed aliasing issue, but with -fno-strict-aliasing the bug > is the same. > > Apparently this is ipa-prop bug, because ipa-prop does not track size of > accesses and thus it does not know there is a mismatch. The value produced > is thus not INT_MAX as intended but 255. This is Martin's area. ipa-prop does not track accesses or their sizes, inlining predicate conditions do. Anyway, I checked the two places where access sizes need to be checked (i.e. ipcp_modif_dom_walker::before_dom_children in ipa-prop.c and evaluate_conditions_for_known_args in ipa-inline-analysis.c) and they actually are checked, with one exception causing this bug. The exception is evaluating IS_NOT_CONSTANT and CHANGED types of conditions in evaluate_conditions_for_known_args, which does not check access size for them because the size is not even tracked for them (for normal conditions, size is the size of the stored value to compare with). I suppose the easiest fix is to overload the value field to store the size of the access for these two codes and then add the missing check.
(In reply to Josh Poimboeuf from comment #20) > Thanks very much to everyone who has looked into this so far. It would be > very helpful to get answers to the following questions, so we can understand > the impact to the kernel: > > 1) Is there a reliable way to avoid the bug, either in code or with a gcc > flag? If you mean in general, then unfortunately no, except for disabling inlining altogether and removing all always_inline's. In addition to the main bug, I found out that I check --param ipa-max-agg-items only after incrementing it, so even setting that to zero does not help. I'll prepare a patch for that too. If you want to avoid it at a particular place where you know it already occurs, then it depends on how the code looks like. Would for the following work for you? void __attribute__((noinline,noclone)) broken(u64* shost) { u8 node_name[8]; memset (&node_name, 0xFF, sizeof (node_name)); *shost = wwn_to_u64(node_name); } > > 2) Is there a reliable way to detect the bug by looking at the object code? I cannot think of any.
(In reply to Martin Jambor from comment #23) > (In reply to Josh Poimboeuf from comment #20) > > Thanks very much to everyone who has looked into this so far. It would be > > very helpful to get answers to the following questions, so we can understand > > the impact to the kernel: > > > > 1) Is there a reliable way to avoid the bug, either in code or with a gcc > > flag? > > If you mean in general, then unfortunately no, except for disabling > inlining altogether and removing all always_inline's. In addition to > the main bug, I found out that I check --param ipa-max-agg-items only > after incrementing it, so even setting that to zero does not help. > I'll prepare a patch for that too. Yes, I'm looking for a general way to either prevent or try to detect potential other cases of the bug throughout the entire kernel. Can it only occur with the use of __builtin_constant_p(exp) by an inline function (where exp is a constant)?
(In reply to Martin Jambor from comment #22) > I suppose the easiest fix is to overload the value field to store the > size of the access for these two codes and then add the missing check. OK, so the IS_NOT_CONSTANT case is relatively easy, the following (untested) patch fixes this PR: diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index f8ca825..6901cfb 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -867,8 +867,14 @@ evaluate_conditions_for_known_args (struct cgraph_node *node, clause |= 1 << (i + predicate_first_dynamic_condition); continue; } - if (c->code == IS_NOT_CONSTANT || c->code == CHANGED) + if (c->code == CHANGED) continue; + if (c->code == IS_NOT_CONSTANT) + { + if (!operand_equal_p (c->val, TYPE_SIZE (TREE_TYPE (val)), 0)) + clause |= 1 << (i + predicate_first_dynamic_condition); + continue; + } if (operand_equal_p (TYPE_SIZE (TREE_TYPE (c->val)), TYPE_SIZE (TREE_TYPE (val)), 0)) @@ -1807,8 +1813,9 @@ set_cond_stmt_execution_predicate (struct ipa_func_body_info *fbi, return; FOR_EACH_EDGE (e, ei, bb->succs) if (e->flags & EDGE_FALSE_VALUE) { + tree opsize = unshare_expr_without_location (TYPE_SIZE (TREE_TYPE (op2))); struct predicate p = add_condition (summary, index, &aggpos, - IS_NOT_CONSTANT, NULL_TREE); + IS_NOT_CONSTANT, opsize); e->aux = edge_predicate_pool.allocate (); *(struct predicate *) e->aux = p; } On the other hand, the CHANGED part is more difficult because an unchanged value is represented with an error_mark_node which of course does not encode the size of the unchanged value. So if we wanted to handle it, we'd need to encode it differently. However, I am not convinced it is necessary because, unless I'm forgetting about something, it is only used as inlining heuristics and can never lead to any optimizations and thus neither to miscompilations. Am I right? In any event, I'd like to talk to Honza bout this before going any further.
(In reply to Josh Poimboeuf from comment #24) > > Yes, I'm looking for a general way to either prevent or try to detect > potential other cases of the bug throughout the entire kernel. > > Can it only occur with the use of __builtin_constant_p(exp) by an inline > function (where exp is a constant)? Yes and no. Yes, the problematic inlining analysis predicate condition type IS_NOT_CONSTANT is only created for conditions containing a call to __builtin_constant_p. But no, a function does not have to be explicitly marked inline to be an inline candidate. This bug can occur when an inlineable function containing a call to __builtin_constant_p, which checks a parameter or a value it references and a (possibly indirect) caller of the function actually passes a constant, but stores it using a type of a different size. In your testcase, the array initialization is a series of byte stores but the "load" in __swab64p is 64-bit. Note that gcc cannot detect that the parameter in your testcase was actually a compile-time constant. IPA level does by mistake and removes the slow path which is later selected for real (this can be improved if deemed worthwhile but IPA analysis still needs to check the size to be correct).
On April 15, 2016 11:58:39 PM GMT+02:00, "jamborm at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > >--- Comment #26 from Martin Jambor <jamborm at gcc dot gnu.org> --- >(In reply to Josh Poimboeuf from comment #24) >> >> Yes, I'm looking for a general way to either prevent or try to detect >> potential other cases of the bug throughout the entire kernel. >> >> Can it only occur with the use of __builtin_constant_p(exp) by an >inline >> function (where exp is a constant)? > >Yes and no. Yes, the problematic inlining analysis predicate >condition type IS_NOT_CONSTANT is only created for conditions >containing a call to __builtin_constant_p. But no, a function does >not have to be explicitly marked inline to be an inline candidate. > >This bug can occur when an inlineable function containing a call to >__builtin_constant_p, which checks a parameter or a value it >references and a (possibly indirect) caller of the function actually >passes a constant, but stores it using a type of a different size. > >In your testcase, the array initialization is a series of byte stores >but the "load" in __swab64p is 64-bit. Note that gcc cannot detect >that the parameter in your testcase was actually a compile-time >constant. IPA level does by mistake and removes the slow path which >is later selected for real (this can be improved if deemed worthwhile >but IPA analysis still needs to check the size to be correct). I was arguing that if IPA proves a condition to true/false then it should adjust it that way in modification phase. This would have prevented the bug as well (eventually using the 'wrong' path). At least it would have avoided hitting a unreachable path at runtime which can be very tricky to debug.
(In reply to rguenther@suse.de from comment #27) > I was arguing that if IPA proves a condition to true/false then it > should adjust it that way in modification phase. The thing is that it does not prove it correctly. If we modify the testcase so that only the byte at offset zero is constant and others are not, inlining would still think the whole eight-byte access is to a known constant. Redirecting to some builtin_true or builtin_false would have made debugging easier, I agree.
Created attachment 38316 [details] Fix storing access size to conditions Honza asked me to also come up with a version of the patch where we store access size to the condition (as a HOST_WIDE_INT) and use that for access size comparisons, so that we avoid any potential confusion (for example if the loaded value is run through a NOP_EXPR after loading but before the described use). So this is it. It survives both regular and LTO bootstrap and testing on an x86_64-linux.
On Wed, 20 Apr 2016, jamborm at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > > --- Comment #29 from Martin Jambor <jamborm at gcc dot gnu.org> --- > Created attachment 38316 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38316&action=edit > Fix storing access size to conditions > > Honza asked me to also come up with a version of the patch where we > store access size to the condition (as a HOST_WIDE_INT) and use that > for access size comparisons, so that we avoid any potential confusion > (for example if the loaded value is run through a NOP_EXPR after > loading but before the described use). > > So this is it. It survives both regular and LTO bootstrap and testing > on an x86_64-linux. Any reason it's not unsigned HOST_WIDE_INT size?
(In reply to rguenther@suse.de from comment #30) > > Any reason it's not unsigned HOST_WIDE_INT size? The only reason is to use the same type in which get_ref_base_and_extent returns size.
Author: jamborm Date: Wed May 18 13:04:23 2016 New Revision: 236389 URL: https://gcc.gnu.org/viewcvs?rev=236389&root=gcc&view=rev Log: [PR 70646] Store size to inlining predicate conditions 2016-05-18 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-inline.h (condition): New field size. * ipa-inline-analysis.c (add_condition): New parameter SIZE, use it for comaprison and store it into the new condition. (evaluate_conditions_for_known_args): Use condition size to check access sizes for all but CHANGED conditions. (unmodified_parm_1): New parameter size_p, store access size into it. (unmodified_parm): Likewise. (unmodified_parm_or_parm_agg_item): Likewise. (eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p. (set_cond_stmt_execution_predicate): Extract access sizes and store them to conditions. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_expr_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (inline_read_section): Stream condition size. (inline_write_summary): Likewise. testsuite/ * gcc.dg/ipa/pr70646.c: New test. Added: trunk/gcc/testsuite/gcc.dg/ipa/pr70646.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline-analysis.c trunk/gcc/ipa-inline.h trunk/gcc/testsuite/ChangeLog
Author: jamborm Date: Wed May 18 13:06:24 2016 New Revision: 236390 URL: https://gcc.gnu.org/viewcvs?rev=236390&root=gcc&view=rev Log: Respect --param ipa-max-agg-items=0 2016-05-18 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-prop.c (determine_locally_known_aggregate_parts): Bail out early if parameter PARAM_IPA_MAX_AGG_ITEMS is zero. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-prop.c
Author: jamborm Date: Thu May 19 14:56:35 2016 New Revision: 236462 URL: https://gcc.gnu.org/viewcvs?rev=236462&root=gcc&view=rev Log: [PR 70646] Store size to inlining predicate conditions 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-inline.h (condition): New field size. * ipa-inline-analysis.c (add_condition): New parameter SIZE, use it for comaprison and store it into the new condition. (evaluate_conditions_for_known_args): Use condition size to check access sizes for all but CHANGED conditions. (unmodified_parm_1): New parameter size_p, store access size into it. (unmodified_parm): Likewise. (unmodified_parm_or_parm_agg_item): Likewise. (eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p. (set_cond_stmt_execution_predicate): Extract access sizes and store them to conditions. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_expr_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (inline_read_section): Stream condition size. (inline_write_summary): Likewise. * lto-streamer.h (LTO_minor_version): Bump. testsuite/ * gcc.dg/ipa/pr70646.c: New test. Added: branches/gcc-6-branch/gcc/testsuite/gcc.dg/ipa/pr70646.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/ipa-inline-analysis.c branches/gcc-6-branch/gcc/ipa-inline.h branches/gcc-6-branch/gcc/lto-streamer.h branches/gcc-6-branch/gcc/testsuite/ChangeLog
Author: jamborm Date: Thu May 19 15:00:12 2016 New Revision: 236463 URL: https://gcc.gnu.org/viewcvs?rev=236463&root=gcc&view=rev Log: Respect --param ipa-max-agg-items=0 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-prop.c (determine_locally_known_aggregate_parts): Bail out early if parameter PARAM_IPA_MAX_AGG_ITEMS is zero. Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/ipa-prop.c
Author: jamborm Date: Thu May 19 15:09:21 2016 New Revision: 236473 URL: https://gcc.gnu.org/viewcvs?rev=236473&root=gcc&view=rev Log: [PR 70646] Store size to inlining predicate conditions 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-inline.h (condition): New field size. * ipa-inline-analysis.c (add_condition): New parameter SIZE, use it for comaprison and store it into the new condition. (evaluate_conditions_for_known_args): Use condition size to check access sizes for all but CHANGED conditions. (unmodified_parm_1): New parameter size_p, store access size into it. (unmodified_parm): Likewise. (unmodified_parm_or_parm_agg_item): Likewise. (eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p. (set_cond_stmt_execution_predicate): Extract access sizes and store them to conditions. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_expr_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (inline_read_section): Stream condition size. (inline_write_summary): Likewise. * lto-streamer.h (LTO_minor_version): Bump. testsuite/ * gcc.dg/ipa/pr70646.c: New test. Added: branches/gcc-5-branch/gcc/testsuite/gcc.dg/ipa/pr70646.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/ipa-inline-analysis.c branches/gcc-5-branch/gcc/ipa-inline.h branches/gcc-5-branch/gcc/lto-streamer.h branches/gcc-5-branch/gcc/testsuite/ChangeLog
Author: jamborm Date: Thu May 19 15:10:50 2016 New Revision: 236474 URL: https://gcc.gnu.org/viewcvs?rev=236474&root=gcc&view=rev Log: Respect --param ipa-max-agg-items=0 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-prop.c (determine_locally_known_aggregate_parts): Bail out early if parameter PARAM_IPA_MAX_AGG_ITEMS is zero. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/ipa-prop.c
Author: jamborm Date: Thu May 19 15:17:10 2016 New Revision: 236475 URL: https://gcc.gnu.org/viewcvs?rev=236475&root=gcc&view=rev Log: [PR 70646] Store size to inlining predicate conditions 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-inline.h (condition): New field size. * ipa-inline-analysis.c (add_condition): New parameter SIZE, use it for comaprison and store it into the new condition. (evaluate_conditions_for_known_args): Use condition size to check access sizes for all but CHANGED conditions. (unmodified_parm_1): New parameter size_p, store access size into it. (unmodified_parm): Likewise. (unmodified_parm_or_parm_agg_item): Likewise. (eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p. (set_cond_stmt_execution_predicate): Extract access sizes and store them to conditions. (set_switch_stmt_execution_predicate): Likewise. (will_be_nonconstant_expr_predicate): Likewise. (will_be_nonconstant_predicate): Likewise. (inline_read_section): Stream condition size. (inline_write_summary): Likewise. * lto-streamer.h (LTO_minor_version): Bump. * ipa-prop.c (ipa_load_from_parm_agg): Added size_p parameter, pass it to ipa_load_from_parm_agg_1. * ipa-prop.h (ipa_load_from_parm_agg): Update declaration. testsuite/ * gcc.dg/ipa/pr70646.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/ipa/pr70646.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/ipa-inline-analysis.c branches/gcc-4_9-branch/gcc/ipa-inline.h branches/gcc-4_9-branch/gcc/ipa-prop.c branches/gcc-4_9-branch/gcc/ipa-prop.h branches/gcc-4_9-branch/gcc/lto-streamer.h branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Author: jamborm Date: Thu May 19 15:19:59 2016 New Revision: 236476 URL: https://gcc.gnu.org/viewcvs?rev=236476&root=gcc&view=rev Log: Respect --param ipa-max-agg-items=0 2016-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/70646 * ipa-prop.c (determine_locally_known_aggregate_parts): Bail out early if parameter PARAM_IPA_MAX_AGG_ITEMS is zero. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/ipa-prop.c
Fixed on all branches.