Bug 32107 - bad codegen for vector initialization in Altivec
Summary: bad codegen for vector initialization in Altivec
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-05-27 21:14 UTC by Dorit Naishlos
Modified: 2022-03-08 16:20 UTC (History)
5 users (show)

See Also:
Host:
Target: powerpc*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-05-27 21:31:45


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dorit Naishlos 2007-05-27 21:14:14 UTC
Compiling the folloxing testcase:

#define vector __attribute__((__vector_size__(16) ))
float fa[100] __attribute__ ((__aligned__(16)));
vector float foo ()
{
  float f = fa[0];
  vector float vf = {f, f, f, f};
  return vf;
}

...with gcc -O2 -maltivec, we get:

ld      r9,0(r2)
lfs     f0,0(r9)
addi    r9,r1,-16
stfs    f0,-16(r1)
lvewx   v2,r0,r9
vspltw  v2,v2,0
blr

My problem is with the {lfs,stfs,lvewx} sequence: we load a value into f0, and then store it (with stfs) into an aligned memory location, so that it could be loaded from there into a vector (with lvewx). However, since the address from which f0 was loaded is known to be aligned, we could directly do an lvewx from there, and avoid the extra {lfs,stfs}, so the following should be enough:
	
ld      r9,0(r2)
lvewx   v2,r0,r9
vspltw  v2,v2,0
blr
  
The problem is that rs6000_expand_vector_init doesn't know that f0 is originated from an aligned address. It gets the following as vals:

(parallel:V4SF [
        (reg/v:SF 119 [ f ])
        (reg/v:SF 119 [ f ])
        (reg/v:SF 119 [ f ])
        (reg/v:SF 119 [ f ])
    ])

We somehow want to expand 'f = fa[0]' and '{f,f,f,f}' together... if expand_vector_init could get this as vals: '{fa[0],fa[0],fa[0],fa[0]}', it could see that the original address is aligned. 
Alternatively, the prospects of getting rid of the redundant load and store later on during some kind of a peephole optimization don't seem so high to me... Thoughts?

This may be related to PR31334 (though there the issue is about initialization with constants, so I'm not sure if the idea for a solution proposed there would help us here).
Comment 1 Andrew Pinski 2007-05-27 21:31:45 UTC
This is unrelated to that one.  Even if we did not have an aligned address, we can do better than the extra load/store (but I forget how to do this and I did not write anything on this for vec_splats in the C/C++ language extension for the Cell).



Confirmed.
Comment 2 Andrew Pinski 2008-03-11 21:34:43 UTC
Mine.  The patches which I have from the PS3 toolchain fixes this one, it is related to PR 32110 also.
Comment 3 Andrew Pinski 2008-03-11 21:35:32 UTC
Mine, I said.
Comment 4 Andrew Pinski 2008-07-14 12:47:22 UTC
Note with the Cell, we can just use lvlx with a splat and that works without an alignment attribute on fa :).
Comment 5 revital eres 2009-04-26 07:29:42 UTC
(In reply to comment #2)
> Mine.  The patches which I have from the PS3 toolchain fixes this one, it is
> related to PR 32110 also.

I see this problem still exits on trunk -r146794.
If you still have the patch I will be happy to test it.


Comment 6 Andrew Pinski 2011-11-17 21:59:57 UTC
No longer working on this.