is -O2 breaking sse2 alignment?
Sat Mar 15 00:10:00 GMT 2008
> What I am _really_ trying to do is to implement is the addition of
> elements of two arrays.
> Is there a more efficient way of doing this than this way?:
Question from someone who has just written his first few lines of SSE2 (oh how exciting, but let's not get too excited until we can actually beat the SSE-free standard compile!): How many SSE2 instructions can be run at the same time? I would have thought that if there is much optimising to be done it will be in loading up all the registers and doing lots of SSE instructions in parallel. Presumably the challenge will be organising traffic to and from the registers so that we don't get spikes from loading registers simultaneously. Rather we'd have to load one pair of registers whilst simultaneously adding together another pair whilst simultaneously writing out the result of a third. That kind of thing. Am I on the right track or am I way off the mark?
More information about the Gcc-help