This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi! This patch fixes a bunch of bugs in i386.c memset and memcpy expanders. The patch has been bootstrapped on x86_64-linux, i686-linux, i686-linux --with-tune=pentium-m, i686-linux --with-arch=pentium-m and regtested on the first tree with: make -j48 -k check RUNTESTFLAGS="--target_board=unix/'{`echo -mstringop-strategy={libcall,rep_{,4,8}byte,{byte_,,unrolled_}loop} | sed s0\ 0,0g`,-m64}'" on the second tree with: make -j48 -k check RUNTESTFLAGS="--target_board=unix/'{`echo -mstringop-strategy={libcall,rep_{,4}byte,{byte_,,unrolled_}loop} | sed s0\ 0,0g`,-m32}'" and on the third tree with: make -j48 -k check RUNTESTFLAGS="--target_board=unix/-mtune=pentium-m/'{`echo -mstringop-strategy={libcall,rep_{,4}byte,{byte_,,unrolled_}loop} | sed s0\ 0,0g`,-m32}'" Compared to normal x86_64-linux or i686-linux checking with no special options the following extra FAILs still appear: {{-m64,-m32,-m32/-mtune=pentium-m}/-mstringop-strategy={{byte_,,unrolled_}loop,rep_{,4}_byte},-m64/-mstringop-strategy=rep_8byte} FAIL: gcc.dg/visibility-11.c scan-assembler memcpy@PLT {-m64,-m32,-m32/-mtune=pentium-m}/-mstringop-strategy={byte_loop,libcall} FAIL: gcc.target/i386/cold-attribute-1.c scan-assembler stosb {-m64,-m32,-m32/-mtune=pentium-m}/-mstringop-strategy=libcall FAIL: gcc.target/i386/inline-mcpy.c scan-assembler-not memcpy {-m64,-m32,-m32/-mtune=pentium-m}/-mstringop-strategy={,unrolled_}loop FAIL: gcc.target/i386/pr11001-memset-1.c (internal compiler error) FAIL: gcc.target/i386/pr11001-memset-1.c (test for excess errors) FAIL: gcc.target/i386/pr11001-memset-3.c (internal compiler error) FAIL: gcc.target/i386/pr11001-memset-3.c (test for excess errors) {-m32,-m32/-mtune=pentium-m}/-mstringop-strategy={{byte_,,unrolled_}loop,libcall} FAIL: gcc.target/i386/memcpy-1.c scan-assembler rep FAIL: gcc.target/i386/memcpy-1.c scan-assembler movs -m32/-mtune=pentium-m{,/-mstringop-strategy={{byte_,,unrolled_}loop,rep_{,4}_byte}} FAIL: gcc.dg/tree-ssa/loop-28.c scan-tree-dump-times final_cleanup "MEM" 65 FAIL: gcc.dg/tree-ssa/prefetch-3.c scan-tree-dump-times aprefetch "unroll factor 4" 1 FAIL: gcc.misc-tests/i386-pf-none-1.c scan-assembler-not fetch FAIL: gcc.misc-tests/i386-pf-none-1.c scan-assembler-not fetch FAIL: gcc.target/i386/addr-sel-1.c scan-assembler a\\\\+1 FAIL: gcc.target/i386/addr-sel-1.c scan-assembler b\\\\+1 FAIL: gcc.target/i386/bt-1.c scan-assembler btl[ \\t] FAIL: gcc.target/i386/bt-2.c scan-assembler btl[ \\t] FAIL: gcc.target/i386/bt-mask-1.c scan-assembler-not and[lq][ \\t] FAIL: gcc.target/i386/bt-mask-2.c scan-assembler-not and[lq][ \\t] FAIL: gcc.target/i386/cmov1.c scan-assembler sar[^\\\\n]*magic_namec FAIL: gcc.target/i386/cmov1.c scan-assembler shr[^\\\\n]*magic_namef FAIL: gcc.target/i386/movq-2.c (test for excess errors) FAIL: gcc.target/i386/movq.c (test for excess errors) FAIL: gcc.target/i386/mul.c scan-assembler and[^\\\\n]*magic FAIL: gcc.target/i386/pr30315.c scan-assembler-times cmp 4 FAIL: gcc.target/i386/xchg-1.c scan-assembler rol FAIL: gcc.target/i386/xchg-2.c scan-assembler xchgb but they don't seem to be regressions. pr11001-memset-{1,3}.c is discussed above, the rest of -mstringop-strategy=something only match failures, unless it can be reproduced with just some -mtune= option, can be IMHO ignored and the -m32 -mtune=pentium-m specific failures also look just like testcases not expecting -mtune=pentium-m. The bugs fixed: 1) -m32 -mstringop-strategy=rep_8byte ICEd on all non-libcall memcpy/memset expansions (obviously, as rep; stosq isn't a 32-bit insn) 2) if chosen algorithm need_zero_guard, desired alignment was bigger than actual and count was constant, but forced into register, we could store/copy too many bytes, as need_zero_guard comparison was only done if !count. Say for count 9, size_needed 8, desired_align 8, align 1 (and, not known at compile time, (dst & 7) == 1, we would store 7 bytes in the prologue and then jump into the main loop, so store another 8 bytes, instead of jumping directly to the epilogue 3) for size_needed 1, desired_align 8, and very small sizes movmem jumps around prologue (aligning of dst), but unlike size_needed > 2 where it jumps to the epilogue, it jumps to the main algorithm. But as epilogue_size_needed wasn't reset to 1, epilogue was executed afterwards, storing extra bytes 4) for setmem, size_needed 1, desired_align 8, non-constant value, very small sizes, we'd jump around prologue and promoted_val initialization directly to the main algorithm, which used (subreg:QI (promoted_val) 0). But as promoted_val wasn't initialized in that codepath, we'd store uninitialized value 5) setmem epilogue for desired_align - align > size_needed was called with not a power of 2 value, using that value - 1 as a mask for loops resulted in wrong epilogues 6) movmem for constant small sizes, desired_align > align and size_needed 1 would jump directly to the epilogue, not the main algorithm, which is shorter. When count is smaller than epilogue_size_needed, we just want to do rep; stosb immediately without aligning (or the 1 byte storing loop). 7) setmem for constant sizes could ICE (regression introduced by my recent setmem changes) trying to and a CONST_INT count_exp with a constant mask. Other bugs, not fixed by this patch: 1) pr11001-memset-{1,3}.c ICE with certain -mstring-strategy= switches (see above) ICEs, because decide_alg only considers the main algorithm when deciding if an algorithm is usable when some needed hard regs are fixed. But even for algorithms that don't use fixed regs, the epilogue still uses gen_strset or gen_strmov and those on some architectures still use stos* or movs* insns. If we care about it, I guess we'd need to force loopy epilogue in that case, or avoid using strmov/strset. 2) in some cases for -Os and small sizes memset by pieces was rejected as too large, but the actual expanded sequence looked much bigger. Unfortunately I haven't noted which exact parameters that was, will try to see if I can reproduce it again. Attached are also two testcases I've used to verify it (e.g. offset and length both known at compile time isn't exhaustively tested for small sizes), but they are perhaps too expensive for make check, even on a fast box they can take up to 16 minutes of compile time (runtime is almost instant) with some options. I've tested them both with -O{0,1,2,3,s} {-m32 {,-mtune=pentiumpro},-m64} for all valid -mstringop-strategy= options (plus without that option). Ok for trunk? Jakub
Attachment:
Y191
Description: Text document
Attachment:
memset-4.c
Description: Text document
Attachment:
memcpy-4.c
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |