Summary: | [5 Regression] Linux-3.10.75 on arm926ej-s does not boot due to wrong code generation | ||
---|---|---|---|
Product: | gcc | Reporter: | Christian Eggers <christian.eggers> |
Component: | rtl-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | ceggers, jakub, pageexec, ramana, wilson |
Priority: | P2 | Keywords: | wrong-code |
Version: | 5.1.1 | ||
Target Milestone: | 5.4 | ||
Host: | Target: | arm-arm926ejs-linux-gnueabi | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2015-04-30 00:00:00 | |
Attachments: |
Preprocessed source
Code compiled with r212399 Code compiled with r212400 Experimental patch to force out-of-ssa to use promoted subregs for phi nodes. A possibly better patch, to modify ARM port to stop changing signed HI/QI to unsigned. |
The type of field_width is a signed short. On that basis, the use of LSR rather than ASR looks highly suspicious. The problem seems to be creeping in during the initial expand pass. I can't be sure, but my suspicions are leaning towards this being some artifact of the ipa-sra decomposition of the 'spec' argument. code built with -fno-ipa-sra appears to be ok on a visual inspection. (In reply to Richard Earnshaw from comment #2) > code built with -fno-ipa-sra appears to be ok on a visual inspection. Unfortunately I'll be on vacation until Tuesday, 7th. Then I can try to build to whole kernel with this flag. If -fno-ipa-sra cures this, I wonder if it hasn't started with r221348. Some issues related to that were fixed with r221795, but some clearly haven't - e.g. I believe profiledbootstrap is still broken on arm and today I'm reducing another runtime failure that regressed with r221348 on armv7hl and r221795 doesn't fix that. -fno-ipa-sra cures that as well. (In reply to Richard Earnshaw from comment #2) > code built with > -fno-ipa-sra appears to be ok on a visual inspection. linux-3.10.25 boots fine if globally compiled with additional -fno-ipa-sra The affected linux system is a arm926ej-s. Is this bug also relevant for my Cortex-M boards? Just a feedback for another target platform (arm-none-eabi): I also had problems with a kernel 4.0.4 on an iConnect (Kirkwood variant) when I used gcc 5.1.0. Using KCFLAGS='-fno-ipa-sra' helped immediately. I did not look into the generated code, though, but I assume this is the same problem. The issue is also present when compiling kernels for OMAP1/2 platforms. Also looks like -Os is needed to trigger this. Kernels compiled with -O2 seem to work. Can confirm this on the 5-20150519 snapshot. Here's a self-contained snippet demonstrates this problem (from vsprintf.c): https://gcc.gnu.org/bugzilla/attachment.cgi?id=35613 Created attachment 35714 [details] Code compiled with r212399 Created attachment 35715 [details] Code compiled with r212400 There is a change from arithmetic shift to logical shift. There is another testcase in 66271. The LSR is emitted because the arm port has PROMOTE_MODE defined to make chars and shorts unsigned when extended to int. So the problem seems to be why it wasn't re-sign-extended before the compare. Maybe an optimization pass incorrectly deleted that sign-extend. Part of the problem here is that the arm port defines PROMOTE_MODE to force short local variables to unsigned, but arm_promote_function_mode doesn't change signedness. So a signed short parameter and a signed short local have different representations in a register. Another part of the problem is that the eipa_sra pass is creating phi nodes that mix parameters and local variables. Copying a signed short local into a signed short parameter requires a conversion, even though they have the same type, because they get different in register promotion. Another part of the problem is that the out-of-ssa pass when evaluating phi nodes doesn't go through expand_expr, and hence doesn't handle mode promotion. The code is copying DECL_RTL directly into SA.partition_to_pseudo, and then using SA.partition_to_pseudo when emitting the copy instructions. Thus these phi nodes won't emit any conversion instructions even though one is apparently required. I can get correct code if I pass DECL_RTL through expand_expr before storing it into SA.partition_to_pseudo for VAR_DECLs. However, if I then do the same thing for PARM_DECLS, then I get a movhi insn instead of a conversion, because both the source and dest are promoted subregs. I think the problem here is that emit_partition_copy doesn't handle promoted subregs. I think we need something like what store_expr_with_bounds does, when the dest is a promoted subreg, to do the copy to the smaller mode, and then convert to the wider mode. But I have a few too many suppositions at this point and am not sure if this makes sense. It might be simpler to change the arm port so that promote_mode and function_promote_mode match. Created attachment 35774 [details]
Experimental patch to force out-of-ssa to use promoted subregs for phi nodes.
Created attachment 35775 [details]
A possibly better patch, to modify ARM port to stop changing signed HI/QI to unsigned.
This would require performance testing to see what the effect is. This is a simpler patch than trying to change the out-of-ssa pass.
(In reply to Jim Wilson from comment #14) > Created attachment 35775 [details] > A possibly better patch, to modify ARM port to stop changing signed HI/QI to > unsigned. > > This would require performance testing to see what the effect is. This is a > simpler patch than trying to change the out-of-ssa pass. linux-3.10.75 boots successful when compiled by a gcc built with this patch. Shall I also test the other patch? On Tue, Jun 16, 2015 at 8:03 AM, christian.eggers at kathrein dot de <gcc-bugzilla@gcc.gnu.org> wrote: > Shall I also test the other patch? The out-of-ssa patch is unfinished, and won't work without more changes. Jim I've been researching how the ARM PROMOTE_MODE reached its current state. There are a number of issues here. 1) There is a comment that says zero extension is faster for char than signed extension. This is only true before the [su]xtb instructions were added, i.e. !arm_arch6. Before these instructions, zero extend can be done with an and immediate, but sign extension requires two shifts. So there is no benefit to zero extension for current architectures. 2) The comment doesn't apply to shorts, which require two shifts for both zero and sign extension when !arm_arch6. The signedness was changed for shorts originally because some processors did not support unaligned accesses, and had to loads shorts with two load byte instructions. This required 3 insns for zero extend and 4 for sign extend, so there was an advantage to zero extension, but only if the processor did not support unaligned accesses. Unfortunately, the support was flawed, and did UNSIGNEDP = TARGET_MMU_TRAPS != 0; \ which forces it to unsigned for MMU_TRAPS and signed for the default case. It should have been something like if (TARGET_MMU_TRAPS) UNSIGNEDP = 1; \ When the TARGET_MMU_TRAPS feature was removed, this code was changed to set UNSIGNEDP to 0 unconditionally, which compounded the error. This code should have been removed when the TARGET_MMU_TRAPS feature was removed. 3) Originally PROMOTE_FUNCTION_ARGS was a boolean that indicated whether PROMOTE_MODE applied to function arguments or not. When it was changed to a hook, it was made into a function. So now we have two sets of similar code, one in PROMOTE_MODE and one in arm_promote_function_mode (TARGET_PROMOTE_FUNCTION_MODE). The two sets of code should be identical, but the unsigned-ness changing was left out of the arm_promote_function_mode code. The result is that we have two different in-register representations for signed char/short symbols, one form for parameters and one form for local variables. This inconsistency is confusing the out-of-ssa pass. Since changing arm_promote_function_mode is effectively an ABI change, it is safer to change PROMOTE_MODE to match. 4) There is a general principle that you should not change signedness in PROMOTE_MODE unless the hardware forces you to do so. For instance, for the MIPS64 target, the hardware requires that all 32-bit values in 64-bit registers be sign-extended, and 32-bit instructions may trap if this is not true. Thus the mips port forces all 32-bit values to be signed. But the arm architecture has no similar constraint, and hence we should not be changing signedness here. Changing signedness results in less efficient code, as can be seen in this testcase, where the compiler must emit both a zero-extend and a sign-extend in the loop. The sign-extend is required to get correct results for a compare, and the zero-extend is required because PROMOTE_MODE requires it. Ultimately, I believe that this is an ARM backend bug. PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should not behave differently. It would help if the PROMOTE_MODE macro was merged with the TARGET_PROMOTE_FUNCTION_MODE hook, to avoid accidents like this. I've tested my patch to modify PROMOTE_MODE so that it no longer sets UNSIGNEDP for char and short. For the SPEC CPU2000 benchmarks, individual benchmarks are within 1% which is within the noise range, and the full benchmark results have almost identical performance. I get 3 additional failures in the gcc testsuite, for gcc.target/arm/wmul-[123].c. These are testcases to verify generation of the smulbb and smlabb instruction. However, they only work currently because of the extra sign-extends emitted by PROMOTE_MODE. We currently emit an unsigned short load, a sign-extend, and a multiply. The sign-extend gets merged into the multiply. But with the patch, we emit a signed short load and a multiply, and hence can't form smulbb. Unpatched, for wmul-1.c we get ldrh r1, [r4, #2]! ldrh r6, [r0, #2]! smlabb r5, r1, r6, r5 smlabb r2, r1, r1, r2 and patched we get ldrsh r1, [r4, #2]! ldrsh r6, [r0, #2]! mla r5, r1, r6, r5 mla r2, r1, r1, r2 wmul-2.c is similar. There is a bigger difference with wmul-3.c. Unpatched is ldrh r1, [r5, #2]! ldrh r4, [r0, #2]! smulbb r4, r1, r4 subs r6, r6, r4 smulbb r1, r1, r1 subs r2, r2, r1 whereas patched is ldrsh r1, [r4, #2]! ldrsh r6, [r0, #2]! mls r5, r1, r6, r5 mls r2, r1, r1, r2 The patched code is better or equivalent to the unpatched code in all cases, but these testcases no longer serve their purpose. I can fix wmul-1.c by changing types to int and casting to signed short. This doesn't work for wmul-2.c because the scalar sign-extend is moved out of the loop, and no longer available to merge with the multiple. This also doesn't work for wmul-3.c, but only because it is cse'd differently. I get unpatched smulbb r4, r1, r4 subs r6, r6, r4 and patched sxth r4, r4 mls r6, r1, r4, r6 At the moment the only option I have to make wmul-2.c and wmul-3.c work is to remove them GCC 5.2 is being released, adjusting target milestone to 5.3. *** Bug 66271 has been marked as a duplicate of this bug. *** (In reply to Jim Wilson from comment #18) > Ultimately, I believe that this is an ARM backend bug. PROMOTE_MODE and > TARGET_PROMOTE_FUNCTION_MODE should not behave differently. It would help > if the PROMOTE_MODE macro was merged with the TARGET_PROMOTE_FUNCTION_MODE > hook, to avoid accidents like this. > > I've tested my patch to modify PROMOTE_MODE so that it no longer sets > UNSIGNEDP for char and short. > > For the SPEC CPU2000 benchmarks, individual benchmarks are within 1% which > is within the noise range, and the full benchmark results have almost > identical performance. This is for armv7-a ? Thumb2 / ARM state ? > > I get 3 additional failures in the gcc testsuite, for > gcc.target/arm/wmul-[123].c. These are testcases to verify generation of > the smulbb and smlabb instruction. However, they only work currently > because of the extra sign-extends emitted by PROMOTE_MODE. We currently > emit an unsigned short load, a sign-extend, and a multiply. The sign-extend > gets merged into the multiply. But with the patch, we emit a signed short > load and a multiply, and hence can't form smulbb. Unpatched, for wmul-1.c > we get > ldrh r1, [r4, #2]! > ldrh r6, [r0, #2]! > smlabb r5, r1, r6, r5 > smlabb r2, r1, r1, r2 > and patched we get > ldrsh r1, [r4, #2]! > ldrsh r6, [r0, #2]! > mla r5, r1, r6, r5 > mla r2, r1, r1, r2 There are 2 considerations here. smlabb, smulbb on some of the older cores are more efficient than the equivalent mla instructions there can be a 1 cycle issue delta and a 1 cycle result latency overhead. Further more in Thumb2 - ldrsh has a smaller immediate range than ldrh and thus might cause more inefficient register usage compared to the original case. I think investigating performance on an older core across a range of benchmarks (not relying on just spec but looking at something like the popular standard embedded benchmark would also be interesting) as I don't think this exposes enough 16 bit multiplication operations would be interesting. > wmul-2.c is similar. There is a bigger difference with wmul-3.c. Unpatched > is > ldrh r1, [r5, #2]! > ldrh r4, [r0, #2]! > smulbb r4, r1, r4 > subs r6, r6, r4 > smulbb r1, r1, r1 > subs r2, r2, r1 > whereas patched is > ldrsh r1, [r4, #2]! > ldrsh r6, [r0, #2]! > mls r5, r1, r6, r5 > mls r2, r1, r1, r2 > The patched code is better or equivalent to the unpatched code in all cases, > but these testcases no longer serve their purpose. I can fix wmul-1.c by > changing types to int and casting to signed short. This doesn't work for > wmul-2.c because the scalar sign-extend is moved out of the loop, and no > longer available to merge with the multiple. This also doesn't work for > wmul-3.c, but only because it is cse'd differently. I get unpatched > smulbb r4, r1, r4 > subs r6, r6, r4 > and patched > sxth r4, r4 > mls r6, r1, r4, r6 For e.g. on the Cortex-A9, this is now turning a 4 cycle sequence into a 6 cycle sequence. On older cores, this is worse again as there is no sxth instruction, we've just introduced a couple of additional shifts and that would cost in terms of code size. I do think that before this is submitted back into mainline we need to check across a range of architecture levels the effects of this change. 1 It would be good to look at this for something like libavcodec or fft for some scalar arithmetic code especially to do with short integers. 2. Comparing code generation in terms of code size and / or examples to satisfy that the PROMOTE_MODE change is not sub-optimal would be good. > > At the moment the only option I have to make wmul-2.c and wmul-3.c work is > to remove them So in an attempt to make some progress on this, I've tried Jim's approach of changing PROMOTE_MODE to not convert the short modes to unsigned. Building SPEC2006 INT and looking at the generated assembly actually doesn't look too bad, with the codesize being a tiny bit smaller overall with this patch. Many load operations followed by extend are expressed as single ldrsh instructions. Then there's the ldrh+smullbb vs ldrsh+mla issue. I'm looking at the wmul-1.c testcase. The cse1 pass is the one that eliminates the sign-extends from the arms of the plus-mult. Before cse1 we have: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (sign_extend:SI (mem:HI (reg:SI 135)))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (sign_extend:SI (mem:HI (reg:SI 144)))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/u:HI (reg:SI 138 [ _16 ]) 0))) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/u:HI (reg:SI 136 [ _12 ]) 0))) (reg/v:SI 153 [ sqr ])))) and cse1 transforms this into: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (sign_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ]))))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (sign_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ]))))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (reg:SI 136 [ _12 ]) (reg:SI 138 [ _16 ])) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (reg:SI 136 [ _12 ]) (reg:SI 136 [ _12 ])) (reg/v:SI 153 [ sqr ])))) Notice how the extends on the loads remain but the extends from the plus-mult have disappeared. Before Jim's patch the same sequence is: (insn 41 40 43 3 (set (reg:SI 136 [ _12 ]) (zero_extend:SI (mem:HI (reg:SI 135 [ ivtmp.10 ]))))) (insn 44 43 47 3 (set (reg:SI 138 [ _16 ]) (zero_extend:SI (mem:HI (reg:SI 144 [ ivtmp.13 ]))))) (insn 47 44 49 3 (set (reg/v:SI 141 [ dotp ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/v:HI (reg:SI 138 [ _16 ]) 0))) (reg/v:SI 141 [ dotp ])))) (insn 49 47 51 3 (set (reg/v:SI 153 [ sqr ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0)) (sign_extend:SI (subreg/s/v:HI (reg:SI 136 [ _12 ]) 0))) (reg/v:SI 153 [ sqr ])))) So the sign-extends inside the plus-mult don't match the zero_extends from the loads, so it can't cse the sign-extends away. If the SImode mult-accumulate is indeed more expensive than extend-mult-accumulate on some cores, then cse and costs should have gated that transformation? Worth looking into IMO. If we fix these widening-mult-accumulate issues will Jim's arm backend patch be a valid patch correctness-wise? GCC 5.3 is being released, adjusting target milestone. *** Bug 67295 has been marked as a duplicate of this bug. *** Author: ktkachov Date: Thu Feb 4 09:50:12 2016 New Revision: 233130 URL: https://gcc.gnu.org/viewcvs?rev=233130&root=gcc&view=rev Log: [ARM] PR target/65932: stop changing signedness in PROMOTE_MODE 2016-02-04 Jim Wilson <jim.wilson@linaro.org> PR target/65932 PR target/67714 * config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and HImode. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.h Author: ktkachov Date: Thu Feb 4 09:51:35 2016 New Revision: 233131 URL: https://gcc.gnu.org/viewcvs?rev=233131&root=gcc&view=rev Log: [ARM][1/4] PR target/65932: Add testcase PR target/65932 PR target/67714 * gcc.c-torture/execute/pr67714.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr67714.c Modified: trunk/gcc/testsuite/ChangeLog Author: ktkachov Date: Thu Feb 4 09:54:37 2016 New Revision: 233132 URL: https://gcc.gnu.org/viewcvs?rev=233132&root=gcc&view=rev Log: [ARM][2/4] Fix operand costing logic for SMUL[TB][TB] PR target/65932 PR target/67714 * config/arm/arm.c (arm_new_rtx_costs, MULT case): Properly extract the operands of the SIGN_EXTENDs from a SMUL[TB][TB] rtx. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c Author: ktkachov Date: Thu Feb 4 09:56:13 2016 New Revision: 233133 URL: https://gcc.gnu.org/viewcvs?rev=233133&root=gcc&view=rev Log: [cse][3/4] Don't overwrite original rtx when folding source of set PR target/65932 PR target/67714 * cse.c (cse_insn): Pass NULL to fold_rtx when initially folding the source of a SET. Modified: trunk/gcc/ChangeLog trunk/gcc/cse.c Author: ktkachov Date: Thu Feb 4 09:57:36 2016 New Revision: 233134 URL: https://gcc.gnu.org/viewcvs?rev=233134&root=gcc&view=rev Log: [ARM][4/4] Adjust gcc.target/arm/wmul-[123].c tests PR target/65932 PR target/67714 * gcc.target/arm/wmul-3.c: Simplify test to generate just a single smulbb instruction. * gcc.target/amr/wmul-1.c: Add -mtune=cortex-a9 to dg-options. * gcc.target/amr/wmul-2.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/arm/wmul-1.c trunk/gcc/testsuite/gcc.target/arm/wmul-2.c trunk/gcc/testsuite/gcc.target/arm/wmul-3.c Fixed for GCC 6. I've asked for a backport to GCC 5 and if approved I'll do it after it had a bit of time to bake on trunk. Author: ktkachov Date: Tue Feb 16 15:11:26 2016 New Revision: 233454 URL: https://gcc.gnu.org/viewcvs?rev=233454&root=gcc&view=rev Log: [ARM] PR target/65932: stop changing signedness in PROMOTE_MODE Backport from trunk: 2016-02-04 Jim Wilson <jim.wilson@linaro.org> PR target/65932 PR target/67714 * config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and HImode. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/arm/arm.h Author: ktkachov Date: Tue Feb 16 15:13:05 2016 New Revision: 233455 URL: https://gcc.gnu.org/viewcvs?rev=233455&root=gcc&view=rev Log: [ARM] Adjust tests after fix for PR 65932 PR target/65932 * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options. xfail the scan-assembler test. * gcc.target/arm/wmul-2.c: Likewise. * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb. Modified: branches/gcc-5-branch/gcc/testsuite/ChangeLog branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/wmul-1.c branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/wmul-2.c branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/wmul-3.c Fixed for GCC 5.4 and GCC 6. Phew. Author: clyon Date: Mon Mar 7 08:18:25 2016 New Revision: 234019 URL: https://gcc.gnu.org/viewcvs?rev=234019&root=gcc&view=rev Log: 2016-03-07 Christophe Lyon <christophe.lyon@linaro.org> Backport from mainline 2016-02-04 Kyrylo Tkachov <kyrylo.tkachov@arm.com> PR target/65932 PR target/67714 * config/arm/arm.c (arm_new_rtx_costs, MULT case): Properly extract the operands of the SIGN_EXTENDs from a SMUL[TB][TB] rtx. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/arm/arm.c |
Created attachment 35424 [details] Preprocessed source gcc version: 5.1.0 / 5.1.1 (current git snapshot) host : x86_64_suse_linux (13.1) target: arm-arm926ejs-linux-gnueabi Configured with: /home/eggers/Projekte/cds/repos/software/tools/toolchain/src/gcc-gcc-5-branch-1e71966/configure --prefix=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi --with-pkgversion='Kathrein CDS 15278' --target=arm-arm926ejs-linux-gnueabi --disable-multilib --mandir=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/share/man --infodir=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/share/info --with-gmp=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/install --with-mpfr=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/install --with-mpc=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/install --with-isl=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/install --with-sysroot=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/sys-root --disable-nls --with-slibdir=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/sys-root/lib --with-gxx-include-dir=/home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/sys-root/usr/include/c++/gcc-5-branch-1e71966 --enable-shared --enable-__cxa_atexit --enable-c99 --enable-threads=posix --disable-install-libiberty --with-system-zlib --with-cpu=arm926ej-s --with-float=soft --with-mode=arm --enable-languages=c,c++ --enable-long-long command line: arm-arm926ejs-linux-gnueabi-gcc -Wp,-MD,lib/.vsprintf.o.d -nostdinc -isystem /home/eggers/Projekte/cds/repos/software/tools/toolchain/build/arm-arm926ejs-linux-gnueabi/arm-arm926ejs-linux-gnueabi/lib64/gcc/arm-arm926ejs-linux-gnueabi/5.1.0/include -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/arch/arm/include -Iarch/arm/include/generated -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/include -Iinclude -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/arch/arm/include/uapi -Iarch/arm/include/generated/uapi -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/include/uapi -Iinclude/generated/uapi -include /home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/include/linux/kconfig.h -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/lib -Ilib -D__KERNEL__ -mlittle-endian -I/home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/arch/arm/mach-at91/include -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -Wno-maybe-uninitialized -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mno-thumb-interwork -funwind-tables -marm -D__LINUX_ARM_ARCH__=5 -march=armv5te -mtune=arm9tdmi -msoft-float -Uarm -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -g -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(vsprintf)" -D"KBUILD_MODNAME=KBUILD_STR(vsprintf)" -c -o lib/vsprintf.o /home/eggers/Projekte/cds/repos/software/cts/common/platform_linux/external/kernel/linux/linux-3.10/lib/vsprintf.c linux-3.10.75 compiled with gcc-4.9.2 --> running same kernel compiled with gcc-5.1.0 / gcc-5.1.1 (git snapshot) --> NOT running Observations Function string() (part of vsnprintf()) in lib/vsprintf.c is not compiled correctly: f5c: e1dd51f8 ldrsh r5, [sp, #24] [...] # r5 = 0xFFFF FFFF f70: e2453001 sub r3, r5, #1 # r3 = 0xFFFF FFFE f74: e1a03803 lsl r3, r3, #16 # r3 = 0xFFFE 0000 while (len < spec.field_width--) { f78: e1500005 cmp r0, r5 # r0 = 8 # r5 = 0xFFFF FFFF --> correct f7c: e1a03823 lsr r3, r3, #16 # r3 = 0x0000 FFFE f80: aa000005 bge f9c <string.isra.5+0x64> # branch taken --> correct [...] f9c: e1a05003 mov r5, r3 # r5 = r3 = 0x0000 FFFE [...] while (len < spec.field_width--) { fdc: e1500005 cmp r0, r5 # r0 (len) = 8 # r5 (spec.field_width) is still 0x0000 FFFE --> not ok # spec.field_width should be 0xFFFF FFFE at this point fe0: e1a03823 lsr r3, r3, #16 fe4: aa000005 bge 1000 <string.isra.5+0xc8> # branch NOT taken --> wrong