[PATCH] Allow fwprop to undo vectorization harm (PR68961)

Tue Jul 12 22:08:00 GMT 2016

On Sun, Jul 10, 2016 at 10:12 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener <rguenther@suse.de> wrote:
>
>>> > 2016-07-04  Richard Biener  <rguenther@suse.de>
>>> >
>>> >     PR rtl-optimization/68961
>>> >     * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT
>>> >     to simplify to a non-constant.
>>> >
>>> >     * gcc.target/i386/pr68961.c: New testcase.
>>>
>>> Thanks, LGTM.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes
>>
>> FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2
>>
>> as the peephole created for that testcase no longer applies as fwprop
>> does
>>
>> In insn 10, replacing
>>  (vec_concat:V2DF (vec_select:DF (reg:V2DF 91)
>>             (parallel [
>>                     (const_int 0 [0])
>>                 ]))
>>         (mem:DF (reg/f:DI 95) [0  S8 A128]))
>>  with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *)&a + 8B] ])
>>         (mem:DF (reg/f:DI 95) [0  S8 A128]))
>> Changed insn 10
>>
>> resulting in
>>
>>         movsd   a+8(%rip), %xmm0
>>         movhpd  a+16(%rip), %xmm0
>>
>> again rather than movupd.
>>
>> Uros, there is probably a missing peephole for the new form - can you
>> fix this as a followup or should I hold on this patch for a bit longer?
>
> No, please proceed with the patch, I'll fix this fallout with a
> followup patch in a couple of days.

Fixed with attached patch.

2016-07-13  Uros Bizjak  <ubizjak@gmail.com>

    PR rtl-optimization/68961
    * config/i386/sse.md (movsd/movhpd to movupd peephole2s): Add new
    peephole variant.  Use sse_reg_operand predicates.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
-------------- next part --------------
Index: config/i386/sse.md
===================================================================

--- config/i386/sse.md	(revision 238258)
+++ config/i386/sse.md	(working copy)
@@ -1169,10 +1169,10 @@
 
 ;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targets.
 (define_peephole2
-  [(set (match_operand:V2DF 0 "register_operand")
+  [(set (match_operand:V2DF 0 "sse_reg_operand")
 	(vec_concat:V2DF (match_operand:DF 1 "memory_operand")
 			 (match_operand:DF 4 "const0_operand")))
-   (set (match_operand:V2DF 2 "register_operand")
+   (set (match_operand:V2DF 2 "sse_reg_operand")
 	(vec_concat:V2DF (vec_select:DF (match_dup 2)
 					(parallel [(const_int 0)]))
 			 (match_operand:DF 3 "memory_operand")))]
@@ -1181,13 +1181,25 @@
   [(set (match_dup 2) (match_dup 4))]
   "operands[4] = adjust_address (operands[1], V2DFmode, 0);")
 
+(define_peephole2
+  [(set (match_operand:DF 0 "sse_reg_operand")
+	(match_operand:DF 1 "memory_operand"))
+   (set (match_operand:V2DF 2 "sse_reg_operand")
+	(vec_concat:V2DF (match_operand:DF 4 "sse_reg_operand")
+			 (match_operand:DF 3 "memory_operand")))]
+  "TARGET_SSE2 && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
+   && REGNO (operands[4]) == REGNO (operands[2])
+   && ix86_operands_ok_for_move_multiple (operands, true, DFmode)"
+  [(set (match_dup 2) (match_dup 4))]
+  "operands[4] = adjust_address (operands[1], V2DFmode, 0);")
+
 ;; Merge movlpd/movhpd to movupd for TARGET_SSE_UNALIGNED_STORE_OPTIMAL targets.
 (define_peephole2
   [(set (match_operand:DF 0 "memory_operand")
-	(vec_select:DF (match_operand:V2DF 1 "register_operand")
+	(vec_select:DF (match_operand:V2DF 1 "sse_reg_operand")
 		       (parallel [(const_int 0)])))
    (set (match_operand:DF 2 "memory_operand")
-	(vec_select:DF (match_operand:V2DF 3 "register_operand")
+	(vec_select:DF (match_operand:V2DF 3 "sse_reg_operand")
 		       (parallel [(const_int 1)])))]
   "TARGET_SSE2 && TARGET_SSE_UNALIGNED_STORE_OPTIMAL
    && ix86_operands_ok_for_move_multiple (operands, false, DFmode)"