33927 – replace_read in dse.c could handle cases where GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode) (and the size is the same)

Bug 33927 - replace_read in dse.c could handle cases where GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode) (and the size is the same)

Summary: replace_read in dse.c could handle cases where GET_MODE_CLASS (read_mode) != ...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.3.0

Importance:	P3 enhancement
Target Milestone:	4.4.0
Assignee:	Not yet assigned to anyone

URL:	http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:	missed-optimization, patch

Depends on:
Blocks:	35838
	Show dependency tree / graph

Reported:	2007-10-27 18:27 UTC by Andrew Pinski
Modified:	2008-04-12 22:35 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	spu-elf, powerpc64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2008-03-09 16:51:46

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2007-10-27 18:27:04 UTC

Like PR 33790, dse.c could handle the case mentioned in there.  The current issue is that it does:
  if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode))
    return false;

And the clases are different, one is the vector float class (V4SF) and the other is the vector integer class (V4SI) (or just integer class [TI], spu). 

Testcase where an extra store/load exists:

#define vector __attribute__((__vector_size__(16) ))

typedef vector float vec_float4;
typedef struct {
        vec_float4 data;
} VecFloat4;

typedef struct {
        vec_float4 a;
        vec_float4 b;
} VecFloat4x2;


VecFloat4 test1(VecFloat4 a, VecFloat4 b)
{
        a.data = a.data+b.data;
        return a;
}


VecFloat4x2 test2(VecFloat4x2 data)
{
        data.a = data.a+data.a;
        data.b = data.b+data.b;
        return data;
}

Comment 1 Richard Sandiford 2008-03-09 16:51:46 UTC

This is one of the issues that I originally tried to fix for 4.3:

    http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01661.html

but that was too invasive for stage 3.  I resubmitted it after 4.4 opened:

    http://gcc.gnu.org/ml/gcc-patches/2008-02/msg01227.html

and pinged it yesterday:

    http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00537.html

I'll add the PR number to the change log.

Comment 2 Kenneth.Zadeck@NaturalBridge.com 2008-03-10 01:48:40 UTC

I tested the latest patch on ppc-32 and ppc-64 and there were no regressions.

i did have trouble applying the patch.  The second frag of the update for the test case did not apply.

Comment 3 Richard Sandiford 2008-03-22 19:38:48 UTC

Subject: Bug 33927

Author: rsandifo
Date: Sat Mar 22 19:37:53 2008
New Revision: 133452

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133452
Log:
gcc/
	PR rtl-optimization/33927
	* Makefile.in (dse.o): Depend on $(TM_P_H).
	* expr.h (extract_low_bits): Declare.
	* expmed.c (extract_low_bits): New function.
	* rtlhooks.c (gen_lowpart_general): Generalize SUBREG handling.
	* dse.c: Include tm_p.h.
	(find_shift_sequence): Remove the read_reg argument and return the
	read value.  Emit the instructions instead of returning them.
	Iterate on new_mode rather than calculating it each time.
	Check MODES_TIEABLE_P.  Use simplify_gen_subreg to convert the
	source to NEW_MODE and extract_low_bits to convert the shifted
	value to READ_MODE.
	(replace_read): Allow the load and store to have different mode
	classes.  Use extract_low_bits when SHIFT == 0.  Create the shift
	or extraction instructions before trying the replacement.  Update
	dump-file code accordingly, avoiding use of REGNO (store_info->rhs).

gcc/testsuite/
	* gcc.target/mips/dse-1.c: Add checks for zeros.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/dse.c
    trunk/gcc/expmed.c
    trunk/gcc/expr.h
    trunk/gcc/rtlhooks.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/mips/dse-1.c

Comment 4 Richard Sandiford 2008-03-22 19:45:28 UTC

I've applied a patch that should fix this.  No-one's confirmed
whether it does though, so I'm marking the PR as waiting.

Comment 5 Andrew Pinski 2008-03-23 00:48:47 UTC

(In reply to comment #4)
> I've applied a patch that should fix this.  No-one's confirmed
> whether it does though, so I'm marking the PR as waiting.

And I just confirmed, it does fix it, on powerpc64-darwin:

--- old.s       2008-03-22 17:47:53.000000000 -0700
+++ new.s       2008-03-22 17:47:18.000000000 -0700
@@ -24,22 +24,15 @@ _test2:
        stw r0,-4(r1)
        oris r0,r0,0xc000
        mtspr 256,r0
-       li r0,48
-       li r9,64
-       vaddfp v2,v2,v2
-       addi r2,r1,16
-       addi r11,r1,48
+       vaddfp v0,v2,v2
+       addi r2,r1,48
        lwz r12,-4(r1)
-       vaddfp v3,v3,v3
-       stvx v2,0,r11
-       stvx v3,r2,r0
-       lvx v0,0,r11
-       lvx v1,r2,r0
-       addi r2,r1,-128
-       stvx v0,r2,r0
-       stvx v1,r2,r9
-       lvx v2,r2,r0
-       lvx v3,r2,r9
+       vaddfp v1,v3,v3
+       vor v2,v0,v0
+       vor v3,v1,v1
+       stvx v0,0,r2
+       addi r2,r1,64
+       stvx v1,0,r2
        mtspr 256,r12
        blr
        .subsections_via_symbols

Thanks for the fix Richard.

Comment 6 revital eres 2008-03-23 07:20:24 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > I've applied a patch that should fix this.  No-one's confirmed
> > whether it does though, so I'm marking the PR as waiting.
> And I just confirmed, it does fix it, on powerpc64-darwin:

It also fixes the problem on spu-gcc, but on powerpc64-linux 
it seems that the problem still exists:

-----------------------------------
On spu-gcc:

 test2:
-       hbr     .L5,$lr
+       fa      $4,$4,$4
        stqd    $sp,-128($sp)
+       fa      $3,$3,$3
        ai      $sp,$sp,-128
-       stqd    $3,64($sp)
-       stqd    $4,80($sp)
-       lqd     $4,80($sp)
-       lqd     $5,64($sp)
-       fa      $3,$4,$4
-       fa      $2,$5,$5
-       stqd    $3,48($sp)
-       stqd    $2,32($sp)
-       lqd     $4,48($sp)
-       lqd     $3,32($sp)
        ai      $sp,$sp,128
-.L5:
        bi      $lr

-----------------------------------

On powerpc64-linux :

        .file   "struct1.c"
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl test1
        .type   test1, @function
test1:
        stwu 1,-32(1)
        mfvrsave 0
        stw 0,28(1)
        oris 0,0,0xc000
        mtvrsave 0
        lvx 0,0,4
        lvx 1,0,5
        vaddfp 0,0,1
        lwz 12,28(1)
        stvx 0,0,3
        mtvrsave 12
        addi 1,1,32
        blr
        .size   test1,.-test1
        .align 2
        .p2align 4,,15
        .globl test2
        .type   test2, @function
test2:
        stwu 1,-32(1)
        mfvrsave 0
        stw 0,28(1)
        oris 0,0,0xc000
        mtvrsave 0
        li 0,16
        lvx 1,0,4
        lvx 0,4,0
        vaddfp 1,1,1
        vaddfp 0,0,0
        lwz 12,28(1)
        stvx 1,0,4
        stvx 0,4,0
        stvx 0,3,0
        stvx 1,0,3
        mtvrsave 12
        addi 1,1,32
        blr
        .size   test2,.-test2
        .ident  "GCC: (GNU) 4.4.0 20080323 (experimental)"
        .section        .note.GNU-stack,"",@progbits

Comment 7 Andrew Pinski 2008-03-23 17:49:53 UTC

(In reply to comment #6)
> It also fixes the problem on spu-gcc, but on powerpc64-linux 
> it seems that the problem still exists:

No, the problem does not still exist, what exist is an extra store and that is really recorded as PR 30271.

Thanks,
Andrew Pinski