This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: pervasive SSE codegen inefficiency



On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote:


Consider the following SSE code
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)
<4256776a.c>

The first inner loop compiles to

paddq %xmm0, %xmm1

Good. The second compiles to

        movdqa  %xmm2, %xmm0
        paddw   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1

when it could be using a single paddw. The basic problem is that
our approach defines __m128i to be V2DI even though all the operations
on the object are V4SI, so there are a lot of subreg's that don't need
to generate code. I'd like to fix this, but am not sure how to go about it.

From real looks of this looks more like a register allocation issue and nothing to do with subregs at all, except subregs being there.


Take a look at .greg: ;; 4 regs to allocate: 64 (4) 61 63 (4) 65 ;; 61 conflicts: 61 63 64 65 66 7 21 ;; 63 conflicts: 61 63 64 65 66 7 21 22 ;; 64 conflicts: 61 63 64 65 7 ;; 64 preferences: 21 22 ;; 65 conflicts: 61 63 64 65 66 7 21 ;; 66 conflicts: 61 63 65 66 7 21 ;; 66 preferences: 22 ;; 67 conflicts: 67 7 21 ;; 67 preferences: 22

and then look at allocation:
(reg:V8HI 21 xmm0 [66])
(reg:V8HI 22 xmm1 [orig:64 a ] [64])
(reg/v:V2DI 23 xmm2 [orig:63 z ] [63])


Original instructions:


(insn:HI 23 21 25 2 (set (reg:V8HI 66)
(plus:V8HI (subreg:V8HI (reg/v:V2DI 63 [ z ]) 0)
(subreg:V8HI (reg/v:V2DI 64 [ a ]) 0))) 680 {*addv8hi3} (nil)
(expr_list:REG_DEAD (reg/v:V2DI 64 [ a ])
(nil)))
(insn:HI 25 23 27 2 (set (reg/v:V2DI 64 [ a ])
(subreg:V2DI (reg:V8HI 66) 0)) 542 {*movv2di_internal} (insn_list:REG_DEP_TRUE 23 (nil))
(expr_list:REG_DEAD (reg:V8HI 66)
(nil)))


(insn:HI 33 31 38 3 (set (reg:V8HI 67)
(plus:V8HI (subreg:V8HI (reg/v:V2DI 64 [ a ]) 0)
(subreg:V8HI (reg/v:V2DI 64 [ a ]) 0))) 680 {*addv8hi3} (nil)
(expr_list:REG_DEAD (reg/v:V2DI 64 [ a ])
(nil)))


(note:HI 38 33 41 3 NOTE_INSN_FUNCTION_END)

(insn:HI 41 38 47 3 (set (reg/i:V2DI 21 xmm0 [ <result> ])
(subreg:V2DI (reg:V8HI 67) 0)) 542 {*movv2di_internal} (insn_list:REG_DEP_TRUE 33 (nil))
(expr_list:REG_DEAD (reg:V8HI 67)
(nil)))



Instructions after allocation: (insn 60 21 23 2 (set (reg:V8HI 21 xmm0 [66]) (reg:V8HI 23 xmm2)) 540 {*movv8hi_internal} (nil) (nil))

(insn:HI 23 60 25 2 (set (reg:V8HI 21 xmm0 [66])
        (plus:V8HI (reg:V8HI 21 xmm0 [66])
            (reg:V8HI 22 xmm1 [orig:64 a ] [64]))) 680 {*addv8hi3} (nil)
    (nil))

(insn:HI 25 23 27 2 (set (reg/v:V2DI 22 xmm1 [orig:64 a ] [64])
(reg:V2DI 21 xmm0 [66])) 542 {*movv2di_internal} (insn_list:REG_DEP_TRUE 23 (nil))
(nil))
...
(insn 61 31 33 3 (set (reg:V8HI 21 xmm0 [67])
(reg:V8HI 22 xmm1)) 540 {*movv8hi_internal} (nil)
(nil))


(insn:HI 33 61 38 3 (set (reg:V8HI 21 xmm0 [67])
        (plus:V8HI (reg:V8HI 21 xmm0 [67])
            (reg:V8HI 22 xmm1 [orig:64 a ] [64]))) 680 {*addv8hi3} (nil)
    (nil))

(note:HI 38 33 41 3 NOTE_INSN_FUNCTION_END)

(insn:HI 41 38 47 3 (set (reg/i:V2DI 21 xmm0 [ <result> ])
(reg:V2DI 21 xmm0 [67])) 542 {*movv2di_internal} (insn_list:REG_DEP_TRUE 33 (nil))
(nil))


If we allocated 64 and 63 as the same register, it would have worked correctly.


Yes removing the extra set helps but does not solve the real issue of the
register allocator being stupid.


Thanks,
Andrew Pinski


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]