Bug 32280 - _mm_srli_si128, heinous code for some shifts
Summary: _mm_srli_si128, heinous code for some shifts
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-06-11 02:59 UTC by tbp
Modified: 2009-12-22 17:52 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-06-11 07:44:30


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tbp 2007-06-11 02:59:59 UTC
I lack words to describe what happens on x86-64 to
<------------------------------------------------------>
#include <emmintrin.h>


__m128i foo(__m128i a) { return _mm_srli_si128(a, 8); }

int main() { return 0; }
<------------------------------------------------------>

# /usr/local/gcc-4.2-20060916/bin/gcc -O1 pr-psrldq.c -o pr-psrldq

000000000040042e <foo>:
  40042e:       66 0f 7f 44 24 d8       movdqa %xmm0,0xffffffffffffffd8(%rsp)
  400434:       48 8b 54 24 e0          mov    0xffffffffffffffe0(%rsp),%rdx
  400439:       48 89 d0                mov    %rdx,%rax
  40043c:       31 d2                   xor    %edx,%edx
  40043e:       48 89 44 24 e8          mov    %rax,0xffffffffffffffe8(%rsp)
  400443:       48 89 54 24 f0          mov    %rdx,0xfffffffffffffff0(%rsp)
  400448:       66 0f 6f 44 24 e8       movdqa 0xffffffffffffffe8(%rsp),%xmm0
  40044e:       c3                      retq

gcc-4.3-20070105 is still that creative.

As far as i know, it's specific to x86-64 but i'm not sure if other shifting ops or specific values also are pathologic.
Comment 1 tbp 2007-06-11 03:02:40 UTC
s/gcc-4.3-20070105/gcc-4.3-20070608/
Comment 2 Uroš Bizjak 2007-06-11 07:35:21 UTC
This is the fault of combine pass. It isn't obvious to me, why it is converting:

(insn 7 6 13 2 shift.c:6 (set (subreg:TI (reg:V2DI 62) 0)
        (ashift:TI (subreg:TI (reg:V2DI 61 [ __a ]) 0)
            (const_int 64 [0x40]))) 1115 {sse2_ashlti3} (insn_list:REG_DEP_TRUE 6 (nil))
    (expr_list:REG_DEAD (reg:V2DI 61 [ __a ])
        (expr_list:REG_EQUAL (ashift:TI (subreg:TI (reg/v:V2DI 60 [ __a ]) 0)
                (const_int 64 [0x40]))
            (nil))))
)

to

(insn 7 6 13 2 shift.c:6 (parallel [
            (set (subreg:TI (reg:V2DI 62) 0)
                (ashift:TI (reg:TI 21 xmm0 [ __a ])
                    (const_int 64 [0x40])))
            (clobber (reg:CC 17 flags))
        ]) 412 {*ashlti3_2} (nil)
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg:V2DI 21 xmm0 [ __a ])
            (nil))))

Note the clobber in above pattern. So, what is wrong with:

(define_insn "sse2_ashlti3"
  [(set (match_operand:TI 0 "register_operand" "=x")
	(ashift:TI (match_operand:TI 1 "register_operand" "0")
		   (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))]
  "TARGET_SSE2"
Comment 3 Andrew Pinski 2007-06-11 07:44:30 UTC
Because you should not be have two patterns that could match (with one differs via a clobber).  The way to fix this is have the sse2_ashlti3 one do what ashlti3_2 does for "r" constraints also.

This is the normal two patterns doing the same thing problem.

-- Pinski
Comment 4 uros 2007-06-11 10:13:14 UTC
Subject: Bug 32280

Author: uros
Date: Mon Jun 11 10:13:00 2007
New Revision: 125615

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125615
Log:
	PR target/32280
	* config/i386/sse.md ("sse2_ashlti", "sse2_lshrti3"): Move ...
	* config/i386/i386.md ("sse2_ashlti", "sse2_lshrti3"): ... to here.

testsuite/ChangeLog:
	
	PR target/32280
	* gcc.target/i386/pr32280.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr32280.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 5 Uroš Bizjak 2007-06-11 10:18:28 UTC
Fixed.(In reply to comment #3)
> Because you should not be have two patterns that could match (with one differs
> via a clobber).  The way to fix this is have the sse2_ashlti3 one do what
> ashlti3_2 does for "r" constraints also.

But we need clobber for "r" constraint, because in this case, insn is split into logical operations that clobber flags.
Comment 6 Uroš Bizjak 2007-06-12 12:48:17 UTC
Fixed.
Comment 7 uros 2009-12-17 12:33:33 UTC
Subject: Bug 32280

Author: uros
Date: Thu Dec 17 12:33:09 2009
New Revision: 155312

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155312
Log:
	PR target/32280
	* config/i386/i386-modes.def (V1TI): New vector mode.
	* config/i386/i386.h (VALID_SSE_REG_MODE): Add V1TImode.
	(SSE_REG_MODE_P): Ditto.
	* config/i386/sse.md (SSEMODE16): New mode iterator.
	(AVXMODE16): Ditto.
	(avxvecmode): Handle V1TI mode.
	(*avx_mov<mode>_internal): Use AVXMODE16 instead of AVXMODE.
	(mov<mode>): Use SSEMODE16 instead of SSEMODE.
	(*mov<mode>_internal): Ditto.
	(push<mode>1): Ditto.
	(movmisalign<mode>): Ditto.
	(sse2_ashlv1ti): Rename from sse2_ashlti.
	(sse2_lshrv1ti): Rename from sse2_lshrti.
	(*avx_ashlv1ti): Rename from *avx_ashlti and move from i386.md.
	(*avx_lshrv1ti): Rename from *avx_lshrti and move from i386.md.
	(vec_shl_<mode>): Convert operands to V1TImode and use V1TI shift.
	(vec_shr_<mode>): Ditto.
	(*sse2_mulv4si3): Update for renamed sse2_ashlv1ti3.
	(udot_prodv4si): Ditto.
	* config/i386/i386.c (classify_argument): Handle V1TImode.
	(function_arg_advance_32): Ditto.
	(function_arg_32): Ditto.
	(ix86_expand_sse4_unpack): Convert operands to V1TImode and update
	for renamed gen_sse2_lshrv1ti3.
	(ix86_expand_args_builtin) <V2DI_FTYPE_V2DI_INT_CONVERT>: Set rmode
	to V1TImode.
	(struct builtin_description) <__builtin_ia32_pslldqi128>: Update
	for renamed sse2_ashlv1ti3.
	<__builtin_ia32_psrldqi128>: Update for renamed sse2_lshrv1ti3.
	
	Revert:
	2007-06-11  Uros Bizjak  <ubizjak@gmail.com>

	PR target/32280
	* config/i386/sse.md ("sse2_ashlti", "sse2_lshrti3"): Move ...
	* config/i386/i386.md ("sse2_ashlti", "sse2_lshrti3"): ... to here.

testsuite/ChangeLog:

	PR target/32280
	* gcc.target/i386/pr32280-1.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr32280-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-modes.def
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog