This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
The attached testcase fails on 3.4 with: mips64el-elf-gcc -O2 inherit.c -Tidt.ld I haven't yet found a similar testcase for mainline. The root cause is a bug in reload inheritance. We have rtl like this: A: (set (reg:QI R2) (subreg:QI (reg:HI R1))) B: (set (reg:DI R3) (ashift:DI (subreg:DI (reg:QI R2) 0) (const_int 58))) ... later insns which use R1 ... in which R1 is allocated to a general register (G1) and R2 is spilled to the stack. R2's stack slot naturally satisfies the destination constraints of insn A so there are no reloads of R2 for insn B to inherit. choose_reload_regs() therefore tries to inherit using the find_equiv_reg() route instead. find_equiv_reg() finds insn A and rightly says that (reg:QI R2) is equivalent to (subreg:QI (reg:HI G1) 0). As expected, the right hand side gets simplified to (reg:QI G1): else if (GET_CODE (equiv) == SUBREG) { /* This must be a SUBREG of a hard register. Make a new REG since this might be used in an address and not all machines support SUBREGs there. */ regno = subreg_regno (equiv); equiv = gen_rtx_REG (rld[r].mode, regno); } and then there are various checks to see whether (reg:QI G1) can be inherited here. All these checks say it can, so an inheritance is duly recorded: /* If we found an equivalent reg, say no code need be generated to load it, and use it as our reload reg. */ if (equiv != 0 && (regno != HARD_FRAME_POINTER_REGNUM || !frame_pointer_needed)) { ... rld[r].reg_rtx = equiv; reload_inherited[r] = 1; ... } However, the inheritance gets rejected by the next loop: if (! free_for_value_p (true_regnum (check_reg), rld[r].mode, rld[r].opnum, rld[r].when_needed, rld[r].in, (reload_inherited[r] ? rld[r].out : const0_rtx), r, 1)) { if (pass) continue; reload_inherited[r] = 0; reload_override_in[r] = 0; } because reload_reg_free_for_value_p() checks: if (TEST_HARD_REG_BIT (reload_reg_unavailable, regno)) return 0; and because G1 is included in reload_reg_unavailable (R1/G1 being live throughout insn B). Now that the code above has cancelled the inheritance, we're left with an _uninherited_ reload from (reg:QI R2) to (reg:QI G1). Although such a reload correctly preserves the low byte of (reg:HI R1/G1), it clobbers the high byte, causing trouble for any later uses of the register. In this particular case, I suspect reload_reg_free_for_value_p() is being too conservative. There doesn't seem to be any reason why we can't inherit a value that is invariant throughout the instruction. [ FWIW, the original form of the reload_reg_unavailable check comes from: http://gcc.gnu.org/ml/gcc-patches/1999-11n/msg00333.html in which the cited reason was to treat asm clobbers as earlyclobbers. The current form of the check is due to: http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00386.html ] However, the real problem seems to be that the checks for find_equiv_reg() inheritance are too loose. They only test free_for_value_p() when inheriting a spill register: /* If we found a spill reg, reject it unless it is free and of the desired class. */ if (equiv != 0) { int regs_used = 0; int bad_for_class = 0; int max_regno = regno + rld[r].nregs; for (i = regno; i < max_regno; i++) { regs_used |= TEST_HARD_REG_BIT (reload_reg_used_at_all, i); bad_for_class |= ! TEST_HARD_REG_BIT (reg_class_contents[(int) rld[r].class], i); } if ((regs_used && ! free_for_value_p (regno, rld[r].mode, rld[r].opnum, rld[r].when_needed, rld[r].in, rld[r].out, r, 1)) || bad_for_class) equiv = 0; } and on the face of it, just checking spill registers could lead to the same sort of problem that brought about the patch linked above. On the other hand, calling free_for_value_p() unconditionally might be going too far the other way: it would check lower-priority reloads whose reg_rtxes haven't been finalised yet. (They might change later due to inheritance.) Perhaps an obvious fix would be to make the "regs_used/bad_for_class |=" loop reject the inheritance if a register is in reload_reg_unavailable. Unfortunately, this seems to kill most of the advantage of the code. I tried three versions of reload1.c: (a) Current mainline. (b) Current mainline with the fix just suggested. (c) Current mainline with find_equiv_reg() inheritance disabled altogether. Compiling CSiBE on mips64-elf at -O2 showed the following differences between (a) and (b): parser 5976 5984 : 100.13% parse 10384 10392 : 100.08% tblcmp 5168 5184 : 100.31% src/lpgparse 56824 56840 : 100.03% src/prntstat 4216 4224 : 100.19% src/remsp 9344 9352 : 100.09% src/resolve 15560 15576 : 100.10% jcsample 3248 3256 : 100.25% jfdctint 1424 1440 : 101.12% jdphuff 4360 4368 : 100.18% jdcoefct 5152 5168 : 100.31% jidctint 2416 2408 : 99.67% jquant2 6136 6152 : 100.26% rdtarga 3096 3104 : 100.26% transupp 6544 6552 : 100.12% mspack/cabd 9168 9176 : 100.09% mspack/lzxd 8968 8976 : 100.09% test/cabextract_md5 33420 33428 : 100.02% test/md5 4496 4504 : 100.18% fs/jbd/recovery 2824 2832 : 100.28% fs/nfs/read 3200 3208 : 100.25% fs/ext3/inode 17352 17344 : 99.95% fs/select 3320 3328 : 100.24% mm/memory 10568 10576 : 100.08% mm/filemap 20048 20056 : 100.04% mm/vmscan 3352 3360 : 100.24% mm/page_alloc 6392 6416 : 100.38% mm/shmem 9352 9360 : 100.09% ipc/sem 6328 6336 : 100.13% net/ipv4/route 17540 17556 : 100.09% net/ipv4/ip_output 7588 7596 : 100.11% net/sunrpc/xprt 9428 9452 : 100.25% drivers/block/ll_rw_blk 9760 9768 : 100.08% kernel/ptrace 1736 1744 : 100.46% src/core/tcp_output 3800 3808 : 100.21% src/netif/tcpdump 1800 1808 : 100.44% dump 2004 2012 : 100.40% src/echo/renderEcho 5744 5752 : 100.14% src/ell/test/tq 4280 4288 : 100.19% src/gage/sclanswer 5408 5416 : 100.15% src/hest/usage 4520 4536 : 100.35% src/mite/txf 9200 9208 : 100.09% src/moss/xform 3456 3480 : 100.69% src/nrrd/filt 7504 7512 : 100.11% src/nrrd/reorder 13480 13488 : 100.06% src/nrrd/cc 15216 15240 : 100.16% src/ten/glyph 5928 5936 : 100.13% src/ten/tendSatin 5448 5456 : 100.15% src/ten/tendGlyph 8192 8200 : 100.10% ---------------------------------------------------------------- Total 3939049 3939537 : 100.01% and these differences between (b) and (c): blocksort 8608 8616 : 100.09% src/resolve 15576 15600 : 100.15% jquant2 6152 6160 : 100.13% pnm2png 4272 4264 : 99.81% drivers/char/n_tty 9472 9464 : 99.92% src/nrrd/measure 13580 13588 : 100.06% src/nrrd/resampleNrrd 9048 9040 : 99.91% src/nrrd/cc 15240 15232 : 99.95% src/ten/chan 9904 9912 : 100.08% ---------------------------------------------------------------- Total 3939537 3939561 : 100.00% A spot check shows that most of (c)->(b) benefit comes from two things: (1) Secondary reloads. Suppose we have a secondary reload: hardreg1 -> hardreg2 -> pseudoN reg_last_reload_reg[pseudoN] will be hardreg1, so that's what "normal" reload inheritance will try to use. OTOH, find_equiv_reg() will find the move from hardreg2 to pseudoN and therefore try to inherit hardreg2 instead. In many cases, hardreg2 is likely to belong to a more general class than hardreg1, in which case there's often more chance of it being suitable. (2) Situations like: (set (reg hardreg) (thing)) ... (set (reg pseudoreg) (thing)) where (reg pseudoreg) is spilled to the stack. Although no inheritance takes place for the second insn, the find_equiv_reg() code will choose "hardreg" for the reload of "pseudoreg". This gives more chance for clean-up later. But as the (b)->(a) results show, most of the benefit of the current code comes from cases where we can't inherit the reload. In fact the typical situation seems to very much like the one that's causing trouble. We have sequences like: (set (reg:SI R2) (reg:SI R1)) ... (set (reg:SI R3) (... (reg:SI R2) ...)) where, as above, R1 is allocated a hard register G1 and R2 is spilled to the stack. We again choose to use G1 for the input reload in the second instruction, at first thinking that we can inherit the value from the first insn. This doesn't happen, so we end up with a reload: (set (reg:SI R2) (reg:SI G1)) ... (set (reg:SI G1) (reg:SI R2)) <---- (set (reg:SI R3) (... (reg:SI G1) ...)) but the post-reload optimisers can remove it. When no subregs are involved, this procedure should be safe, since we're simply reloading the value that the register already had. It was still surprising to me that all this find_equiv_reg() stuff doesn't seem to be benefiting from true inheritance that much. It certainly doesn't seem to be explained that way in the comments. Anyway, the patch below fixes the testcase and produces no change in CSiBE score. I've also bootstrapped & regression tested it on i686-pc-linux-gnu (mainline). I'm not 100% certain that this is the right approach though. By the way, I think both normal and paradoxical subregs need this treatment. Paradoxical subregs need it because of things like TRULY_NOOP_TRUNCATION. Richard
Attachment:
reg-equiv-inherit.diff
Description: Text document
#define MULTI1(X) \ X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19), \ X(20), X(21), X(22), X(23), X(24), X(25), X(26), X(27), X(28), X(29) #define COPY1(INDEX) c1[INDEX] = g[INDEX].s #define COPY2(INDEX) c2[INDEX] = g[INDEX].bits.x1 + g[0].s #define COPY3(INDEX) c3[INDEX] = g[INDEX].bits.x2 + g[0].s #define COPY4(INDEX) c4[INDEX] = g[INDEX].s union { struct { short x1:3, x2:3; } bits; short s; } g[30]; volatile short c1[30], c2[30], c3[30], c4[30]; int main () { int i; for (i = 0; i < 30; i++) g[i].s = 0x7fff; MULTI1 (COPY1); MULTI1 (COPY2); MULTI1 (COPY3); MULTI1 (COPY4); for (i = 0; i < 30; i++) if (c4[i] != c1[i]) abort (); exit (0); }
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |