This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFA: pervasive SSE codegen inefficiency

Consider the following SSE code
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)

Attachment: 4256776a.c
Description: Text document

The first inner loop compiles to

paddq %xmm0, %xmm1

Good. The second compiles to

        movdqa  %xmm2, %xmm0
        paddw   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1

when it could be using a single paddw. The basic problem is that
our approach defines __m128i to be V2DI even though all the operations
on the object are V4SI, so there are a lot of subreg's that don't need
to generate code. I'd like to fix this, but am not sure how to go about it.
The pattern-matching and RTL optimizers seem quite hostile to mismatched
mode operations. If I were starting from scratch I'd define a single V128I mode
and distinguish paddw and paddq by operation codes, or possibly by using
subreg:SSEMODEI throughout the patterns. Any less intrusive ideas? Thanks.

(ISTR some earlier discussion about this but can't find it; apologies if
I'm reopening something that shouldn't be:)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]