Created attachment 39937 [details] testcase Originally from the 'flint' package on Debian failing to build: https://buildd.debian.org/status/fetch.php?pkg=flint&arch=mipsel&ver=2.5.2-10&stamp=1477705959 The attached testcase when compiled and run on a 64-bit mips processor crashes with a SIGBUS error. The same binary works when run on a 32-bit mips processor. Compile with -O2 optimization like this: > gcc -O2 -march=mips32r2 ldxc1.c -o ldxc1 The crash occurs at the ldxc1 instruction at the end of this exerpt from the inner loop in the ldxc1_test function: > 8e0: 02028023 subu s0,s0,v0 > 8e4: afbf003c sw ra,60(sp) > 8e8: 0080b025 move s6,a0 > 8ec: 8fb3005c lw s3,92(sp) > [loop begins here] > 8f0: 02e0a825 move s5,s7 > 8f4: 26d6ffff addiu s6,s6,-1 > 8f8: 0256102a slt v0,s2,s6 > 8fc: 10400012 beqz v0,948 <ldxc1_test+0xd8> > 900: 8f998054 lw t9,-32684(gp) > 904: 8e620000 lw v0,0(s3) > 908: 8ea6fff8 lw a2,-8(s5) > 90c: 8ee30000 lw v1,0(s7) > 910: 00551021 addu v0,v0,s5 > 914: 26b5fffc addiu s5,s5,-4 > 918: 00511021 addu v0,v0,s1 > 91c: 00c33023 subu a2,a2,v1 > 920: 8c420000 lw v0,0(v0) > 924: 00541021 addu v0,v0,s4 > 928: 2694fff8 addiu s4,s4,-8 > 92c: 4e020301 ldxc1 $f12,v0(s0) Before the ldxc1 instruction is executed, gdb reports that the values in v0 and s0 are both large integers (above 0x80000000): (gdb) print/x $v0 $1 = 0xfffee7f8 (gdb) print/x $s0 $2 = 0x80008b50 When added together, the lower 32-bits contains the correct pointer (in this case on the stack). On a 32-bit processor this is fine. On a 64-bit processor, we know that v0 and s0 are sign extended as the last instructions to touch them were the addu at 924 and the subu at 8e0. So the values in the registers are actually: v0 = 0xfffffffffffee7f8 s0 = 0xffffffff80008b50 Adding these together (modulo 64-bit) gives the final pointer of 0xffffffff7fff7348 which is outside the user address space and thus results in a SIGBUS. I think GCC is assuming that the address calculated by the ldxc1 instruction is modulo 32-bit when compiled for a 32-bit processor. However, this is not true if the code is later run on a 64-bit processor.
Created attachment 39938 [details] testcase v2 Testcase actually initializes the arrays this time :) The bug still occurs.
I think this code is undefined if you have wrapping pointers. No pointer should ever be above INT_MAX in user space on mips32 due to the memory layout on MIPS32.
As far as I can tell, all the pointers in the original C code are valid and do not wrap. Some of the registers wrap, but they're not pointers (until added with other registers).
Then the problem is a reassociation issue. That is we should never get an overflow happening for pointers for MIPS.
I'm struggling to see what can be fixed in GCC for this (and I'm not sure I follow why there is a problem with GCC's behaviour) but... The issue may stem from the C front end where the dumps start off as below. Note that the '-1' in kappa-1 has ended up being represented as 1073741823 (0x3fffffff) presumably owing to the fact it will be multiplied by 4 eventually and hence be 'safe'. I can't figure out so far what creates this value but the behavior started some time between gcc 4.7 and 4.8 I believe. The questionably bad code then shows up dependent on optimisations and instruction availability. If the calculation had been done with ADDU instead of in the LDXC1 then all would be well. Regardless of whether this is a real bug for the compiler then the Linux kernel should be making use of the status.UX bit in order to properly emulate a 32-bit environment for o32 on 64-bit kernel/hardware and had this been run using a 32-bit kernel (on 64-bit hardware) then it would run correctly. There is an issue for N32 code with the UX bit in that you also need the PX bit available to independently control 64-bit instruction availability vs address space but this is only missing in MIPSIV cores. ;; Function ldxc1_test (null) ;; enabled by -tree-original { int kappa2 = kappa; double tmp = 0.0; int kappa2 = kappa; double tmp = 0.0; <D.1497>:; kappa-- ; if (zeros + 1 < kappa) { tmp = *(*(r->rows + ((sizetype) kappa + 1073741823) * 4) + ((sizetype) kappa + 536870911) * 8) * ctt; tmp = ldexp (tmp, *(expo + ((sizetype) kappa + 1073741823) * 4) - *(expo + (sizetype) ((unsigned int) kappa2 * 4))); } if (zeros + 1 < kappa && *(s + ((sizetype) kappa + 536870911) * 8) <= tmp) goto <D.1497>; else goto <D.1498>; <D.1498>:; return tmp;
> The issue may stem from the C front end where the dumps start off as below. > Note that the '-1' in kappa-1 has ended up being represented as 1073741823 > (0x3fffffff) presumably owing to the fact it will be multiplied by 4 > eventually and hence be 'safe'. All pointer calculations are done in sizetype and it is unsigned. This kind of issues generally come from the RTL level, maybe expansion here.
(In reply to Eric Botcazou from comment #6) > > The issue may stem from the C front end where the dumps start off as below. > > Note that the '-1' in kappa-1 has ended up being represented as 1073741823 > > (0x3fffffff) presumably owing to the fact it will be multiplied by 4 > > eventually and hence be 'safe'. > > All pointer calculations are done in sizetype and it is unsigned. This kind > of issues generally come from the RTL level, maybe expansion here. The expansion looks like an acceptable transformation to me i.e. it is not introducing the overflow for the offending pointer just maintaining what is already in the tree. I dug back to reassoc2 based on Andrew's comment which gets the following as input: _48 = (sizetype) kappa_6(D); _49 = _48 * 8; _50 = s_37(D) + _49; ivtmp.19_47 = (unsigned int) _50 and then in the loop: _54 = (sizetype) s_37(D); _55 = ivtmp.19_3 - _54; _56 = _55 + 4294967280; # RANGE [0, 4294967295] NONZERO 4294967288 _19 = _56; # PT = nonlocal escaped _20 = _17 + _19; # VUSE <.MEM_4> _21 = *_20 This includes a pointer subtraction of 's' (somewhat pointless but valid I believe). reassoc2 changes it to this: _48 = (sizetype) kappa_6(D); _49 = _48 * 8; _50 = s_37(D) + _49; ivtmp.19_47 = (unsigned int) _50; and in the loop: _54 = (sizetype) s_37(D); _38 = -_54; _25 = 4294967280 - _54; _56 = _25 + ivtmp.19_3; # RANGE [0, 4294967295] NONZERO 4294967288 _19 = _56; # PT = nonlocal escaped _20 = _17 + _19; # VUSE <.MEM_4> _21 = *_20; And _56 is where we get the overflow. I'm still not sure if there is really a bug. Should reassoc not be doing this for 'sizetype'? Should ivopts not have created the mess in the first place? Would changing either of these actually introduce an assurance that this situation won't occur from other optimisations?
> The expansion looks like an acceptable transformation to me i.e. it is not > introducing the overflow for the offending pointer just maintaining what is > already in the tree. Wrap around for unsigned types is OK but, if expansion does implicit extensions to larger types, then things can easily go wrong. > I'm still not sure if there is really a bug. Should reassoc not be doing > this for 'sizetype'? Should ivopts not have created the mess in the first > place? Would changing either of these actually introduce an assurance that > this situation won't occur from other optimisations? sizetype is unsigned so all the transformations looks valid.
(In reply to Eric Botcazou from comment #8) > > The expansion looks like an acceptable transformation to me i.e. it is not > > introducing the overflow for the offending pointer just maintaining what is > > already in the tree. > > Wrap around for unsigned types is OK but, if expansion does implicit > extensions to larger types, then things can easily go wrong. Sure, and in this case there are no implicit extensions to larger types. The bug does require an implicit extension to occur but this only happens at runtime when on 64-bit hardware and the core is not configured to be limited to a 32-bit address space which is always true for a 64-bit kernel as it turns out. > > I'm still not sure if there is really a bug. Should reassoc not be doing > > this for 'sizetype'? Should ivopts not have created the mess in the first > > place? Would changing either of these actually introduce an assurance that > > this situation won't occur from other optimisations? > > sizetype is unsigned so all the transformations looks valid. I'm leaning heavily towards not-a-compiler-bug but may need to offer a workaround until we can solve this in the Linux kernel. The only workaround possible is to provide an option to disable register+index addressing i.e. remove [ls][wd]xc1 instructions.
> Sure, and in this case there are no implicit extensions to larger types. The > bug does require an implicit extension to occur but this only happens at > runtime when on 64-bit hardware and the core is not configured to be limited > to a 32-bit address space which is always true for a 64-bit kernel as it > turns out. OK, if the compiler doesn't know that larger types are involved, then indeed it cannot do anything to prevent implicit extensions. Such a configuration needs to define Pmode as DImode and ptr_mode as SImode for the compiler to work properly.
TBH this does look like trying to rely on UB to me, as per section 6.5.6 "Additive operators" clause 8 of the C language standard, which states (among others): "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." Here under the triggering conditions the pointer the integer is added to with LDXC1 does not point to an element of the array operated on (or to one past the last), so the hardware operation matches the standard's semantics WRT overflow and I don't think we ought to be pushing it. So it looks like a middle end bug to me and the backend is fine in faithfully assuming its RTL pattern won't be passed operands leading to UB. Have I missed anything? Maciej
Bisected the problem to commit r216501: commit 9a416363e99c9f2d48fa810e220bc2f7904f1788 Author: zqchen <zqchen@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue Oct 21 03:38:37 2014 +0000 2014-10-21 Zhenqiang Chen <zhenqiang.chen@arm.com> * cfgloopanal.c (seq_cost): Delete. * rtl.h (seq_cost): New prototype. * rtlanal.c (seq_cost): New function. * tree-ssa-loop-ivopts.c (seq_cost): Delete. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@216501 138bc75d-0d04-0410-961f-82ee72b054a4 More analysis to follow. Given the short time until the release, we plan submit a patch to provide a target flag and build option to avoid the bug.
(In reply to Maciej W. Rozycki from comment #11) > TBH this does look like trying to rely on UB to me, as per section > 6.5.6 "Additive operators" clause 8 of the C language standard, which > states (among others): > > "If both the pointer operand and the result point to elements of the > same array object, or one past the last element of the array object, > the evaluation shall not produce an overflow; otherwise, the behavior > is undefined." > > Here under the triggering conditions the pointer the integer is added > to with LDXC1 does not point to an element of the array operated on (or > to one past the last), so the hardware operation matches the standard's > semantics WRT overflow and I don't think we ought to be pushing it. Aren't language rules mostly irrelevant at this stage though? We don't really have the concept of a pointer after generating RTL we just happen to have some values in a mode that is the same as Pmode. mips.h comment for Pmode: /* Specify the machine mode that pointers have. After generation of rtl, the compiler makes no further distinction between pointers and any other objects of this machine mode. */ > So it looks like a middle end bug to me and the backend is fine in > faithfully assuming its RTL pattern won't be passed operands leading to > UB. I can't see any feature/option/setting that gives this guarantee. Why do you think the backend can make this assumption?
Author: mpf Date: Thu Jan 19 16:05:59 2017 New Revision: 244640 URL: https://gcc.gnu.org/viewcvs?rev=244640&root=gcc&view=rev Log: MIPS: PR target/78176 add -mlxc1-sxc1. gcc/ PR target/78176 * config.gcc (supported_defaults): Add lxc1-sxc1. (with_lxc1_sxc1): Add validation. (all_defaults): Add lxc1-sxc1. * config/mips/mips.opt (mlxc1-sxc1): New option. * gcc/config/mips/mips.h (OPTION_DEFAULT_SPECS): Add a default for mlxc1-sxc1. (TARGET_CPU_CPP_BUILTINS): Add builtin_define for __mips_no_lxc1_sxc1. (ISA_HAS_LXC1_SXC1): Gate with mips_lxc1_sxc1. * gcc/doc/invoke.texi (-mlxc1-sxc1): Document the new option. * doc/install.texi (--with-lxc1-sxc1): Document the new option. gcc/testsuite/ * gcc.target/mips/lxc1-sxc1-1.c: New file. * gcc.target/mips/lxc1-sxc1-2.c: Likewise. * gcc.target/mips/mips.exp (mips_option_groups): Add ghost option HAS_LXC1. (mips_option_groups): Add -m[no-]lxc1-sxc1. (mips-dg-init): Detect default -mno-lxc1-sxc1. (mips-dg-options): Handle HAS_LXC1 arch upgrade/downgrade. Added: trunk/gcc/testsuite/gcc.target/mips/lxc1-sxc1-1.c trunk/gcc/testsuite/gcc.target/mips/lxc1-sxc1-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config.gcc trunk/gcc/config/mips/mips.h trunk/gcc/config/mips/mips.opt trunk/gcc/doc/install.texi trunk/gcc/doc/invoke.texi trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/mips/mips.exp
Created attachment 40631 [details] Prototype change to backout r216501. > Bisected the problem to commit r216501: The review discussion of r216501 starts with message: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00758.html Which contains: The are two implementations of seq_cost. The function bodies are exactly the same. The patch removes one of them and make the other global. This seems the patch was cleanup that shouldn't introduce a functional change. However implementations of seq_cost are different, per final version of the patch: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00896.html cfgloopanal.c: - cost += set_rtx_cost (set, speed); rtlanal.c: + cost += set_rtx_cost (set, speed); tree-ssa-loop-ivopts.c: - cost += set_src_cost (SET_SRC (set), speed); In general, when computing the cost of a sequence of N INSNs this increases the cost of the sequence by N*4. This really throws off the costing of substituting different IVs on MIPS. The first patch attached (just a prototype) basically reverts this change. The second fixes a problem with r220473, a fix for PR62631 from Eric Botcazou: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62631#c17 This looks a generic problem in get_shiftadd_cost to me, it ought to mimic the algorithms in expmed.c, something like ... This change can lower the cost of a sequence of instruction. However there are situations this (lower) cost is being scaled by an estimated iteration count will cause the adjusted cost to now become zero. For the example attached to the second patch the IV replacement algorithm will determine that the cost using separate IVs for each load will be less than then cost of one IV for all loads. Thus, in the second patch we detect that a non-zero cost being scaled to zero should represented by one instead, which gets us back to IVSOPTS generating just one IV that will be used for all loads.
Created attachment 40632 [details] Tweak to adjust_setup_cost (r220473). Second patch associated with previous comment.
> This really throws off the costing of substituting different IVs on > MIPS. I forgot to mention that for MIPS the net of effect r216501 is to not produce indexed memory OPs in simple examples where we should. But we also will produce problematic indexed memory OPs in situations where address generation costing is a bit complicated (the original issue associated with this bug report). Applying the the two patches I just attached fixes the problem of generating indexed memory OPs in simple examples, and also will cause IVOPTS to select IVs that are similar to those that were made in the past that avoids the problem executing indexed memory OPs in O32 binaries on 64-bit MIPS processors running current Linux kernels. There is still the issue of recognizing that rewriting a "use" to use a different IV can expose a problem with indexed memory OPs on 64-bit MIPS processors, where an infinite cost should be associated in that situation, that still needs to be addressed (thus the need for the flag to turn off the generation of indexed memory OPs until this issue is addressed).
CC author and reviewers of r216501.
I agree with the comments that this (if at all) needs to be fixed at RTL expansion time where we already do quite some "hacks" for sizetype in POINTER_PLUS_EXPR context: case POINTER_PLUS_EXPR: /* Even though the sizetype mode and the pointer's mode can be different expand is able to handle this correctly and get the correct result out of the PLUS_EXPR code. */ /* Make sure to sign-extend the sizetype offset in a POINTER_PLUS_EXPR if sizetype precision is smaller than pointer precision. */ if (TYPE_PRECISION (sizetype) < TYPE_PRECISION (type)) treeop1 = fold_convert_loc (loc, type, fold_convert_loc (loc, ssizetype, treeop1)); /* If sizetype precision is larger than pointer precision, truncate the offset to have matching modes. */ but I don't see from the comments in this bug what the actual stmt is the critical one.
I'll collect more tracing data on the costing problem. Hopefully I post an update in the next few days.
Moving target to GCC 8.0 albeit that it may not be solvable. Mitigated in GCC 7.
GCC 8.1 has been released.
GCC 8.2 has been released.
GCC 8.3 has been released.
Is it possible that this same problem is applicable on the 'lwx' instruction? I am using MIPS64 n32. I first saw the original problem as described in this bug with instruction 'lwxc1'. I then used the suggested compilation flag -mno-lxc1-sxc1 which removed that problem. However, in another place in the code I get a SIGBUS on following instruction: lwx v1,a5(v1) where: v1: ffffffffeb623870 a5: ffffffff8a4ee1c4 The exception shows: badvaddr: ffffffff75b11a34 If you sum the lower 32 bits of v1+a5 you get '175b11a34' (i.e. 33 bits). Truncated to 32 bits, this is the value you see in badvaddr. Nevertheless, this address is inside a mapped and read/writable memory range, from /proc/PID/maps: 75b00000-75b21000 rw-p 00000000 00:00 0 This behavior looks very similar to this bug's description. If this same problem indeed also applies to 'lwx', is there a workaround for it?
(In reply to Thomas De Schampheleire from comment #25) > Is it possible that this same problem is applicable on the 'lwx' instruction? > I am using MIPS64 n32. > > I first saw the original problem as described in this bug with instruction > 'lwxc1'. I then used the suggested compilation flag -mno-lxc1-sxc1 which > removed that problem. > > However, in another place in the code I get a SIGBUS on following > instruction: > > lwx v1,a5(v1) > > where: > v1: ffffffffeb623870 > a5: ffffffff8a4ee1c4 > > The exception shows: > badvaddr: ffffffff75b11a34 > > If you sum the lower 32 bits of v1+a5 you get '175b11a34' (i.e. 33 bits). > Truncated to 32 bits, this is the value you see in badvaddr. > Nevertheless, this address is inside a mapped and read/writable memory > range, from /proc/PID/maps: > > 75b00000-75b21000 rw-p 00000000 00:00 0 > > This behavior looks very similar to this bug's description. > > If this same problem indeed also applies to 'lwx', is there a workaround for > it? This is observed with gcc 7.3.0.
Yes, it is the same problem, the same address calculation occurs here, and the lack of 32-bit address space wraparound is a part of the n32 Linux ABI, which implies support for processors that do not support such a wraparound in hardware (no CP0.Status.PX bit). You may try experimenting with ISA/ASE selection options, so that LWX is not considered a valid instruction by GCC. Otherwise I can't help with finding a workaround as I don't know one offhand and I'm not involved with MIPS development anymore, sorry. And neither is Doug BTW. This really ought to be fixed properly in GCC.
(In reply to James Cowgill from comment #0) > Before the ldxc1 instruction is executed, gdb reports that the values in v0 > and s0 are both large integers (above 0x80000000): > (gdb) print/x $v0 > $1 = 0xfffee7f8 > (gdb) print/x $s0 > $2 = 0x80008b50 > > When added together, the lower 32-bits contains the correct pointer (in this > case on the stack). On a 32-bit processor this is fine. > > On a 64-bit processor, we know that v0 and s0 are sign extended as the last > instructions to touch them were the addu at 924 and the subu at 8e0. So the > values in the registers are actually: > v0 = 0xfffffffffffee7f8 > s0 = 0xffffffff80008b50 > > Adding these together (modulo 64-bit) gives the final pointer of > 0xffffffff7fff7348 which is outside the user address space and thus results > in a SIGBUS. > > I think GCC is assuming that the address calculated by the ldxc1 instruction > is modulo 32-bit when compiled for a 32-bit processor. However, this is not > true if the code is later run on a 64-bit processor. Given the above GCC has to consider pointers (even integer regs!?) being 64bit even in 32bit mode since obviously semantics between 32bit and 64bit hardware differs. Thus as Eric says Pmode needs to be DImode and ptr_mode SImode for 32bit and DImode for 64bit. Everything else is going to lead to issues like the ones observed.
I don't remember the detail of this issue but I believe I was convinced that it is down to the lack of setting PX appropriately in HW. UX==0, PX==1. The PX control bit forces address calculations i.e. base + imm or base + reg to be performed with 32-bit rules but allows 64 instruction usage. Since there is a processor mode that is perfectly capable of meeting the requirements of a program with 64bit data and 32bit pointers then the solution is to set PX for N32 rather than UX.
(In reply to mpf from comment #29) > I don't remember the detail of this issue but I believe I was convinced that > it is down to the lack of setting PX appropriately in HW. UX==0, PX==1. The > PX control bit forces address calculations i.e. base + imm or base + reg to > be performed with 32-bit rules but allows 64 instruction usage. Since there > is a processor mode that is perfectly capable of meeting the requirements of > a program with 64bit data and 32bit pointers then the solution is to set PX > for N32 rather than UX. This would have to be done by the kernel when switching to an application, correct? And then only for n32 applications, not for n64 or others.
(In reply to Maciej W. Rozycki from comment #27) > Yes, it is the same problem, the same address calculation occurs here, > and the lack of 32-bit address space wraparound is a part of the n32 > Linux ABI, which implies support for processors that do not support such > a wraparound in hardware (no CP0.Status.PX bit). > > You may try experimenting with ISA/ASE selection options, so that LWX is > not considered a valid instruction by GCC. Otherwise I can't help with > finding a workaround as I don't know one offhand and I'm not involved > with MIPS development anymore, sorry. And neither is Doug BTW. > > This really ought to be fixed properly in GCC. I'm currently using -march=octeon3 or -march=octeon2 as appropriate. I'm not really confident in changing this, as there will be other impact too. As a quick workaround/test, I will try letting the '-mno-lxc1-sxc1' flag also control the use of the lwx and similar instructions, as follows: diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index 23e1672b586..5dee3fbe29f 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -1194,17 +1194,17 @@ struct mips_cpu_info { /* ISA has lwxs instruction (load w/scaled index address. */ #define ISA_HAS_LWXS ((TARGET_SMARTMIPS || TARGET_MICROMIPS) \ - && !TARGET_MIPS16) + && !TARGET_MIPS16 && mips_lxc1_sxc1) /* ISA has lbx, lbux, lhx, lhx, lhux, lwx, lwux, or ldx instruction. */ -#define ISA_HAS_LBX (TARGET_OCTEON2) -#define ISA_HAS_LBUX (ISA_HAS_DSP || TARGET_OCTEON2) -#define ISA_HAS_LHX (ISA_HAS_DSP || TARGET_OCTEON2) -#define ISA_HAS_LHUX (TARGET_OCTEON2) -#define ISA_HAS_LWX (ISA_HAS_DSP || TARGET_OCTEON2) -#define ISA_HAS_LWUX (TARGET_OCTEON2 && TARGET_64BIT) +#define ISA_HAS_LBX (TARGET_OCTEON2 && mips_lxc1_sxc1) +#define ISA_HAS_LBUX ((ISA_HAS_DSP || TARGET_OCTEON2) && mips_lxc1_sxc1) +#define ISA_HAS_LHX ((ISA_HAS_DSP || TARGET_OCTEON2) && mips_lxc1_sxc1) +#define ISA_HAS_LHUX (TARGET_OCTEON2 && mips_lxc1_sxc1) +#define ISA_HAS_LWX ((ISA_HAS_DSP || TARGET_OCTEON2) && mips_lxc1_sxc1) +#define ISA_HAS_LWUX (TARGET_OCTEON2 && TARGET_64BIT && mips_lxc1_sxc1) #define ISA_HAS_LDX ((ISA_HAS_DSP || TARGET_OCTEON2) \ - && TARGET_64BIT) + && TARGET_64BIT && mips_lxc1_sxc1) /* The DSP ASE is available. */ #define ISA_HAS_DSP (TARGET_DSP && !TARGET_MIPS16)
>I'm currently using -march=octeon3 or -march=octeon2 as appropriate. Can you report this to Marvell (Cavium)? O32 was not used much on Octeon.
(In reply to Andrew Pinski from comment #32) > >I'm currently using -march=octeon3 or -march=octeon2 as appropriate. > > Can you report this to Marvell (Cavium)? O32 was not used much on Octeon. Yes, I will. However, please note that I am using N32, not O32.
(In reply to mpf from comment #29) > I don't remember the detail of this issue but I believe I was convinced that > it is down to the lack of setting PX appropriately in HW. UX==0, PX==1. The > PX control bit forces address calculations i.e. base + imm or base + reg to > be performed with 32-bit rules but allows 64 instruction usage. Since there > is a processor mode that is perfectly capable of meeting the requirements of > a program with 64bit data and 32bit pointers then the solution is to set PX > for N32 rather than UX. This is impractical because as I say Linux has to support processors that have no CP0.Status.PX bit and do have to rely on CP0.Status.UX instead. NB Richard, n32 is 64-bit mode, pretty much like x86's x32, except that invented some 20 years earlier. So regs are already DImode as are stack slots, etc.
So presumably the actual solution for n32 would be the same as with x32 and SIB, which IIUC cannot rely on hardware wrapping around the address space either.
GCC 8.4.0 has been released, adjusting target milestone.