Like PR 33790, dse.c could handle the case mentioned in there. The current issue is that it does: if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode)) return false; And the clases are different, one is the vector float class (V4SF) and the other is the vector integer class (V4SI) (or just integer class [TI], spu). Testcase where an extra store/load exists: #define vector __attribute__((__vector_size__(16) )) typedef vector float vec_float4; typedef struct { vec_float4 data; } VecFloat4; typedef struct { vec_float4 a; vec_float4 b; } VecFloat4x2; VecFloat4 test1(VecFloat4 a, VecFloat4 b) { a.data = a.data+b.data; return a; } VecFloat4x2 test2(VecFloat4x2 data) { data.a = data.a+data.a; data.b = data.b+data.b; return data; }
This is one of the issues that I originally tried to fix for 4.3: http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01661.html but that was too invasive for stage 3. I resubmitted it after 4.4 opened: http://gcc.gnu.org/ml/gcc-patches/2008-02/msg01227.html and pinged it yesterday: http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00537.html I'll add the PR number to the change log.
I tested the latest patch on ppc-32 and ppc-64 and there were no regressions. i did have trouble applying the patch. The second frag of the update for the test case did not apply.
Subject: Bug 33927 Author: rsandifo Date: Sat Mar 22 19:37:53 2008 New Revision: 133452 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133452 Log: gcc/ PR rtl-optimization/33927 * Makefile.in (dse.o): Depend on $(TM_P_H). * expr.h (extract_low_bits): Declare. * expmed.c (extract_low_bits): New function. * rtlhooks.c (gen_lowpart_general): Generalize SUBREG handling. * dse.c: Include tm_p.h. (find_shift_sequence): Remove the read_reg argument and return the read value. Emit the instructions instead of returning them. Iterate on new_mode rather than calculating it each time. Check MODES_TIEABLE_P. Use simplify_gen_subreg to convert the source to NEW_MODE and extract_low_bits to convert the shifted value to READ_MODE. (replace_read): Allow the load and store to have different mode classes. Use extract_low_bits when SHIFT == 0. Create the shift or extraction instructions before trying the replacement. Update dump-file code accordingly, avoiding use of REGNO (store_info->rhs). gcc/testsuite/ * gcc.target/mips/dse-1.c: Add checks for zeros. Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/dse.c trunk/gcc/expmed.c trunk/gcc/expr.h trunk/gcc/rtlhooks.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/mips/dse-1.c
I've applied a patch that should fix this. No-one's confirmed whether it does though, so I'm marking the PR as waiting.
(In reply to comment #4) > I've applied a patch that should fix this. No-one's confirmed > whether it does though, so I'm marking the PR as waiting. And I just confirmed, it does fix it, on powerpc64-darwin: --- old.s 2008-03-22 17:47:53.000000000 -0700 +++ new.s 2008-03-22 17:47:18.000000000 -0700 @@ -24,22 +24,15 @@ _test2: stw r0,-4(r1) oris r0,r0,0xc000 mtspr 256,r0 - li r0,48 - li r9,64 - vaddfp v2,v2,v2 - addi r2,r1,16 - addi r11,r1,48 + vaddfp v0,v2,v2 + addi r2,r1,48 lwz r12,-4(r1) - vaddfp v3,v3,v3 - stvx v2,0,r11 - stvx v3,r2,r0 - lvx v0,0,r11 - lvx v1,r2,r0 - addi r2,r1,-128 - stvx v0,r2,r0 - stvx v1,r2,r9 - lvx v2,r2,r0 - lvx v3,r2,r9 + vaddfp v1,v3,v3 + vor v2,v0,v0 + vor v3,v1,v1 + stvx v0,0,r2 + addi r2,r1,64 + stvx v1,0,r2 mtspr 256,r12 blr .subsections_via_symbols Thanks for the fix Richard.
(In reply to comment #5) > (In reply to comment #4) > > I've applied a patch that should fix this. No-one's confirmed > > whether it does though, so I'm marking the PR as waiting. > And I just confirmed, it does fix it, on powerpc64-darwin: It also fixes the problem on spu-gcc, but on powerpc64-linux it seems that the problem still exists: ----------------------------------- On spu-gcc: test2: - hbr .L5,$lr + fa $4,$4,$4 stqd $sp,-128($sp) + fa $3,$3,$3 ai $sp,$sp,-128 - stqd $3,64($sp) - stqd $4,80($sp) - lqd $4,80($sp) - lqd $5,64($sp) - fa $3,$4,$4 - fa $2,$5,$5 - stqd $3,48($sp) - stqd $2,32($sp) - lqd $4,48($sp) - lqd $3,32($sp) ai $sp,$sp,128 -.L5: bi $lr ----------------------------------- On powerpc64-linux : .file "struct1.c" .section ".text" .align 2 .p2align 4,,15 .globl test1 .type test1, @function test1: stwu 1,-32(1) mfvrsave 0 stw 0,28(1) oris 0,0,0xc000 mtvrsave 0 lvx 0,0,4 lvx 1,0,5 vaddfp 0,0,1 lwz 12,28(1) stvx 0,0,3 mtvrsave 12 addi 1,1,32 blr .size test1,.-test1 .align 2 .p2align 4,,15 .globl test2 .type test2, @function test2: stwu 1,-32(1) mfvrsave 0 stw 0,28(1) oris 0,0,0xc000 mtvrsave 0 li 0,16 lvx 1,0,4 lvx 0,4,0 vaddfp 1,1,1 vaddfp 0,0,0 lwz 12,28(1) stvx 1,0,4 stvx 0,4,0 stvx 0,3,0 stvx 1,0,3 mtvrsave 12 addi 1,1,32 blr .size test2,.-test2 .ident "GCC: (GNU) 4.4.0 20080323 (experimental)" .section .note.GNU-stack,"",@progbits
(In reply to comment #6) > It also fixes the problem on spu-gcc, but on powerpc64-linux > it seems that the problem still exists: No, the problem does not still exist, what exist is an extra store and that is really recorded as PR 30271. Thanks, Andrew Pinski