Looking at msmpeg4.s from PR 41567, build with -O2 -fPIC, we see .LC2: data8 mv_tables#+72 ... addl r14 = @gprel(.LC2), gp We should not have spilled that address to the constant pool. Add -mno-sdata and it gets worse -- LC2 moves from .sdata to the .data.rel.ro section but we *still* use @gprel to address it, and that relocation may well be out of range.
I once asked Jim Wilson about this but didn't get an answer. Here is a pointer to that email. Also included here is a short example that generates the gprel load. My earlier example and question can be seen in: http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00170.html Short test case: typedef struct table { int a; int b; int c; } table; extern table mv_tables[2]; void foo() { init_mv_table(&mv_tables[1]); }
Subject: Re: [ia64] Inappropriate address spills This broke between gcc-4.0.0 and gcc-4.1.2. It appears to be the patch for PR 28490. There is a test in ia64_legitimate_constant_p for symbol +offset, where we require that the offset have the low 14-bits be zero which is not at all what we intended to test for. This should be a test for an offset that fits into a short add immediate instruction, which takes a signed-14 bit value, aka CONST_OK_FOR_I, or, hmm, let's look that up, I guess that would be satisfies_constraint_I nowadays. Except that satisfies_constraint_I wants an rtx, and we have an integer, so it might be simpler to just substitute in the right arithmetic for the constant check. I'll attach an untested patch that works for the testcase. Jim
Created attachment 20097 [details] Untested patch that works for sje's testcase
Subject: Re: [ia64] Inappropriate address spills Or maybe we should just accept any constant here? I tried that, and for typedef struct table { int a; int b; int c; } table; extern table mv_tables[100000]; void foo() { init_mv_table(&mv_tables[50000]); } I get addl r35 = @ltoff(mv_tables#+606208), r1 ;; ld8 r35 = [r35] ;; adds r35 = -6208, r35 which is perhaps still better than using @gprel and the constant pool. In this case, in ia64_legitimate_constant_p we can just return true if aligned_offset_symbol_operand is true, and there is no need for the addend local variable. Jim
Subject: Re: [ia64] Inappropriate address spills On third thought... The code here makes sense if we were having problems with invalid constant recombinations. symbol+const gets split by ia64_expand_move into symbol+highpart which is put in the got, and lowpart which is added with adds. If a late optimization pass saw a REG_EQUAL note and tried to put the constant back together again, that would be inconvenient. The ia64_legitimate_constant_p code would prevent that. Unfortunately, this code also prevents us to accepting a constant before ia64_expand_move is called, which prevents us from splitting it in the first place. This looks to be more complicated than I hoped. There is a variable we can check to see if we are in the expand phase, currently_expanding_to_rtl. Perhaps we need to accept any constant if currently_expanding_to_rtl is true, and only accept symbol+highpart (the current code) if it is false. Or alternatively, if this problem only happened at reload time, then checking reload_in_progress and/or reload_completed might work. I will have to look at PR28490 again and try to remember what the original problem that we were fixing was. Jim
I can reproduce the original problem with 28490 by changing the line in ia64_legitimate_constant_p from return (addend & 0x3fff) == 0; to return true; and compiling the testcase with -O. What happens is that in reload.c we see a REG_EQUAL note with symbol+8 and put it in reg_equiv_constant because it is accepted by LEGITIMATE_CONSTANT_P. This then gets substituted into a SET_SRC. Except that this insn now requires a scratch register, and we are after reload, so we end up aborting. The traditional solution for this is to define LEGITIMATE_PIC_OPERAND_P so that it rejects symbol+const when it would require a scratch register. See the sparc port for instance. However, this works only if flag_pic is set. IA-64 is PIC always, and we use flag_pic for shared libraries, so we can't just set it. Or at least, I'm not sure what will happen if we set it, and I don't want to spend that much time looking into it. Another traditional solution is to rewrite pic symbols in such a way that they can't be recognized as constants anymore, for instance, by putting them inside unspecs. This is ugly, and I only mention it for completeness. I don't recommend fixing the problem this way. It might be tempting to check for reload_in_progress, but unfortunately the LEGITIMATE_CONSTANT_P check in reload1.c comes before reload_in_progress is set, so that won't work. And messing with reload isn't a good idea, so we shouldn't try moving where reload_in_progress is set. That leaves the currently_expanding_to_rtl solution. We can accept any constant if that var is true, and we use the current code if that var is false. I think this is the second best solution after the LEGITIMATE_PIC_OPERAND_P solution. I tried this, and found that it is an imperfect solution. I don't get a core dump for 28490, and I don't get a constant pool entry for the 42040 testcase. Unfortunately, I do now get a constant pool entry for the 28490 testcase. Because symbol+8 is no longer a LEGITIMATE_CONSTANT_P, reload decides to call force_const_mem and put it in reg_equiv_memory_loc. So this solution doesn't completely eliminate the constant pool entries, but it will eliminate most of them. To eliminate all of them, we would have to get the LEGITIMATE_PIC_OPERAND_P solution working.
Created attachment 20106 [details] untested patch, imperfect solution
I tried the patch and didn't have any problem bootstrapping and I didn't see any regressions. It also fixed my small test case, but when I went back and tried some of the other tests from PR 28490 I still saw some of the bad gprel usages. Here is a slightly more cutdown testcase from 28490 that still fails with the patch applied and when compiling with -O2. extern const char basesyntax[]; extern const char arisyntax[]; typedef struct __jmp_buf_tag { } jmp_buf[1]; struct jmploc { jmp_buf loc; }; readtoken1 (int firstc, char const *syntax, char *eofmark, int striptabs) { int c = firstc; for (;;) { switch (syntax[c]) { case 1: if (syntax == (basesyntax + 130)) goto endword; syntax = (basesyntax + 130); parsebackq_oldreturn:; } } endword: if (syntax == (arisyntax + 130)) { struct jmploc jmploc; if (_setjmp (jmploc.loc)) goto parsebackq_oldreturn; } }
Subject: Re: [ia64] Inappropriate address spills On Wed, 2010-03-17 at 22:09 +0000, sje at cup dot hp dot com wrote: > I tried the patch and didn't have any problem bootstrapping and I didn't see > any regressions. It also fixed my small test case, but when I went back and > tried some of the other tests from PR 28490 I still saw some of the bad gprel > usages. Here is a slightly more cutdown testcase from 28490 that still fails > with the patch applied and when compiling with -O2. I already mentioned in my previous message that the patch does not eliminate all instances of the gprel usages (constant-pool references). It just eliminates most of them. If you want to eliminate all of them, then you need to get LEGITIMATE_PIC_OPERAND_P working, which in turn will require setting flag_pic by default. which in turn will require verifying that this doesn't break anything. That is probably more work than I have time for. And by "fails", you mean that the compiler is still emitting undesirable code in some cases. The code isn't incorrect, just non-optimal. And we get less non-optimal code with the patch, so it seems to be better than nothing. Unless you want to try to write the better LEGITIMATE_PIC_OPERAND_P solution. Jim
Reading Richard's initial comment I thought the problem was that the code was (or could be) illegal because the relocation may be out of range and we shouldn't use the gprel relocation for any of these constant pool references.
Subject: Re: [ia64] Inappropriate address spills On Wed, 2010-03-17 at 23:47 +0000, sje at cup dot hp dot com wrote: > Reading Richard's initial comment I thought the problem was that the code was > (or could be) illegal because the relocation may be out of range and we > shouldn't > use the gprel relocation for any of these constant pool references. The code is invalid only if you use -mno-sdata. The only reason why Richard wanted to use -mno-sdata is because he saw the non-optimal code, and hoped that -mno-sdata would fix that. It didn't. It just made the problem worse by generating invalid code. So there is probably also a bug here in the handling of -mno-sdata, but fixing that doesn't help Richard, because it doesn't solve the real problem he was complaining about, which was the non-optimal code. I suspect that the -mno-sdata bug is that ia64_expand_load_address does else if (sdata_symbolic_operand (src, VOIDmode)) emit_insn (gen_load_gprel (dest, src)); and sdata_symbolic_operand fails to check TARGET_NO_SDATA, so it assumes that constant pool entries are still in the sdata section, and we end up with gprel references to items that aren't in the sdata section anymore, which will fail at link time. If we fix that, -mno-sdata will work, but we will still have all of the inappropriate address spills (i.e. non-optimal code) which is what the PR summary is complaining about. Jim
Since the proposed patch to meant to address non-optimimal code generation I decided to try the patch with SPEC2006 and see if it helped the performance. On SPECint, I got a 3% slowdown, mostly due to 429.mcf slowing down a lot (20%). 471.omnetap also slowed down (7%) and nothing else changed much. With SPECfp, we got a 0.36% speedup, with 434.zeusmp (4.4%) and 459.GemsFTDT (3.39%) speeding up the most and no significant slowdowns. I will try to see what happened with 429.mcf and see why that slowed down so much.
Subject: Re: [ia64] Inappropriate address spills On Mon, 2010-03-22 at 20:48 +0000, sje at cup dot hp dot com wrote: > Since the proposed patch to meant to address non-optimimal code generation I > decided to try the patch with SPEC2006 and see if it helped the performance. This isn't a simple issue, performance wise. If the compiler puts an address+offset into sdata, then we get something like .sdata .align 8 .LC2: data8 mv_tables#+72 .text addl r14 = @gprel(.LC2), gp ;; ld8 r40 = [r14] If the compiler emits the code that the ABI intended here, we get something like addl r14 = @ltoff(mv_tables#), r1 ;; ld8 r14 = [r14] ;; adds r40 = 72, r14 This sequence is one instruction longer. We deliberately do this to avoid accidentally overflowing the got. If you access the same array with a thousand different offsets, then instead of putting a thousand entries into the got, we put one entry in the got, and then add in the offset. So this is a smaller got, but an extra add instruction that may add to latency if we can't hide it. This code is working for got entries, but not when an address is forced into constant pool. So which one is better depends on what the code looks like. If you have a small number of variables and a large number of offsets, then you will get a lot more memory references and a larger got with the first alternative. If you have a large number of variables and a small number of offsets, then you will get more add instructions with the second alternative. This issue originally came up via PR 41567, which triggers a linker error. Perhaps that testcase would not have triggered the linker error if we were using the second alternative. I haven't looked at this PR as yet, so I don't know the details of what is going on here. Jim
Well it looks like the big SPEC slowdowns were a glitch. The generated code is only slightly different and when I tried to reproduce the results I saw a slight speed up in mcf instead of a slowdown. I still got an overall slowdown in SPEC, but it was only 0.25%.