This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] tuning gcc for AMDFAM10 processor (patch 3)


Hi Richard,

>On Mon, Jan 29, 2007 at 07:12:44PM -0600, Jagasia, Harsha wrote:
>> +             xorps  reg3, reg3
>> +             movaps reg3, reg2
>
>Surely you're not advocating *moving* a zero.  =)

Actually this is something the current mainline compiler does. It's not
being introduced by this patch. This patch does not enable
x86_sse_unaligned_move_optimal for any target other than amdfam10.  So
mtune=generic leaves x86_sse_unaligned_move_optimal disabled, but it
enables x86_sse_partial_reg_dependency. The resulting code
is as indicated in the comments:

+   Code generation for unaligned packed loads of single precision
+   data:
+     if (x86_sse_partial_reg_dependency == true)
+       {
+         if (x86_sse_unaligned_move_optimal == true)
+           {
+             movups mem, reg
+           }
+         else
+           {
+             xorps  reg3, reg3
+             movaps reg3, reg2
+             movlps mem, reg2
+             movhps mem+8, reg2
+           }
+       }
 
I built one of the polyhedron benchmarks with "gfortran -march=k8
-mtune=generic -O3 -ftree-vectorize -w -S aermod.f90 -o
generic/aermod.s".
(I have not extracted a simple test case yet, but we have observed this
with other polyhedron and cpu2006 FP benchmarks as well)

Snippet:
        xorps   %xmm2, %xmm2
        shufps  $0, %xmm0, %xmm0
        addq    $4, %rax
        leaq    (%r12,%rax), %r11
        leaq    (%rbp,%rax), %r10
        leaq    (%rdi,%rax), %rax
        xorl    %r8d, %r8d
        xorl    %ecx, %ecx
        movaps  %xmm0, %xmm3
        .p2align 4,,7
.L154:
        movaps  %xmm2, %xmm0
        addq    $1, %r8
        movaps  %xmm2, %xmm1
        movlps  (%rcx,%r10), %xmm0
        movlps  (%rcx,%r11), %xmm1
        movhps  8(%rcx,%r10), %xmm0
        movhps  8(%rcx,%r11), %xmm1

It is possible that this is a bug with generic. I think we need Honza to
pitch in on this as he wrote this code. However, the last I heard he is
attending a course on ergodic Ramsey theory somewhere in the mountains
without much internet access. 

Meanwhile, perhaps I should fix the comments in the patch I posted to
clearly indicate clearly what is new because of the patch and what is
already being done. Thoughts?

>
>> @@ -9434,6 +9491,13 @@ ix86_expand_vector_move_misalign (enum m
>>  	    }
>>  	  else
>>  	    {
>> +	      if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL)
>> +                {
>> +                  op0 = gen_lowpart (V2DFmode, op0);
>> +                  op1 = gen_lowpart (V2DFmode, op1);
>> +                  emit_insn (gen_sse2_movupd (op0, op1));
>> +                  return;
>> +                }
>>  	      /* ??? Not sure about the best option for the Intel chips.
>>  		 The following would seem to satisfy; the register is
>>  		 entirely cleared, breaking the dependency chain.  We
>> @@ -9453,7 +9517,16 @@ ix86_expand_vector_move_misalign (enum m
>>        else
>>  	{
>>  	  if (TARGET_SSE_PARTIAL_REG_DEPENDENCY)
>> +	    {
>> +	      if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL)
>> +                {
>> +		  op0 = gen_lowpart (V4SFmode, op0);
>> +                  op1 = gen_lowpart (V4SFmode, op1);
>> +		  emit_insn (gen_sse_movups (op0, op1));
>> +		  return;
>> +		}
>>  	    emit_move_insn (op0, CONST0_RTX (mode));
>> +	    }
>
>Un-nest both of these blocks from the IF they're inside.
>TARGET_SSE_UNALIGNED_MOVE_OPTIMAL really has no bearing on
>TARGET_SSE_PARTIAL_REG_DEPENDENCY or TARGET_SSE_SPLIT_REGS,
>and should override both of them.

Ok, I will change this.

Thanks,
Harsha



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]