This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Enable SSE -> GPR moves for generic x86 targets (PR target/54349)


Hi Uros,

I was just updating an old bug (GCC generates MMX instructions but
fails to generate "emms"
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44578) with a case where I
am now hitting this same issue. But as I was including reproducer info
I found that the problem went away with a compiler I just updated and
rebuilt this morning. Turns out that your patch makes this problem
disappear. I think this is just a side-effect of your change though.
Specifically, enabling TARGET_INTER_UNIT_MOVES_FROM_VEC for Generic
and using that in inline_secondary_memory_needed() is changing
instruction selection in a lucky way for my test case (confirmed by
hand-modifying the code to use TARGET_INTER_UNIT_MOVES_TO_VEC instead
of TARGET_INTER_UNIT_MOVES_FROM_VEC). The code is the same going into
reload, but in reload I get the following difference for a
*zero_extendsidi2:

< Choosing alt 6 in insn 11:  (0) ?*y  (1) m
<       Creating newreg=74 from oldreg=67, assigning class MMX_REGS to r74
---
> Choosing alt 8 in insn 11:  (0) ?*x  (1) m
>       Creating newreg=74 from oldreg=67, assigning class SSE_REGS to r74

Would you agree that this is just a lucky side-effect and that there
is still a bug here? I think I will go ahead and update the bug (I can
still reproduce it with -mtune=athlon64). Here is the test case:

---------------------------
I have another instance of this issue. Trunk is generating move
instructions to implement an inlined memcpy. The move instructions use
the MMX registers, but no EMMS instruction is generated. My testcase
then calls a libm function that uses the FPU, which returns incorrect
results. This worked with an older gcc 4.7 based compiler, which
didn't use MMX registers.

The compiler was configured for x86_64-unknown-linux-gnu. The testcase
was compiled with -O2.

$ cat test.cc
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <stdio.h>

namespace {
volatile double dd = 0.080553657784353652;
double dds, ddc;
}

unsigned long long test(float num) {
  if (num < 0) {
    num = 0;
  }

  unsigned int i;
  memcpy(&i, &num, sizeof(unsigned int));
  unsigned long long a = i;

  sincos(dd, &dds, &ddc);
  if (isnan(dds) || isnan(ddc))
  {
    printf ("Failed\n");
    exit (1);
  }
  return a;
}

$ cat test_main.cc
#include <stdio.h>

extern unsigned long long test(float num);
int main()
{
  unsigned long long h = test(1);
  printf ("Passed\n");
}

$ g++ -O2 test*.cc -mtune=athlon64

$ a.out
Failed
---------------------------

Thanks,
Teresa

On Mon, Apr 29, 2013 at 4:08 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
> Attached patch enables SSE -> general register moves for generic x86
> targets. The patch splits TARGET_INTER_UNIT_MOVES to
> TARGET_INTER_UNIT_MOVES_TO_VEC and TARGET_INTER_UNIT_MOVES_FROM_VEC
> tuning flags and updates gcc sources accordingly.
>
> According to AMD optimization manuals, direct moves *FROM* SSE (and
> MMX) registers *TO* general registers should be used for AMD K10
> family and later families. Since Intel targets are unaffected by this
> change, I have also changed generic setting to enable these moves for
> a generic target tuning.
>
> 2013-04-29  Uros Bizjak  <ubizjak@gmail.com>
>
>     PR target/54349
>     * config/i386/i386.h (enum ix86_tune_indices)
>     <X86_TUNE_INTER_UNIT_MOVES_TO_VEC, X86_TUNE_INTER_UNIT_MOVES_FROM_VEC>:
>     New, split from X86_TUNE_INTER_UNIT_MOVES.
>     <X86_TUNE_INTER_UNIT_MOVES>: Remove.
>     (TARGET_INTER_UNIT_MOVES_TO_VEC): New define.
>     (TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto.
>     (TARGET_INTER_UNIT_MOVES): Remove.
>     * config/i386/i386.c (initial_ix86_tune_features): Update.
>     Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only.
>     (ix86_expand_convert_uns_didf_sse): Use
>     TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
>     (ix86_expand_vector_init_one_nonzero): Ditto.
>     (ix86_expand_vector_init_interleave): Ditto.
>     (inline_secondary_memory_needed): Return true for moves from SSE class
>     registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves
>     to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets.
>     * config/i386/constraints.md (Yi, Ym): Depend on
>     TARGET_INTER_UNIT_MOVES_TO_VEC.
>     (Yj, Yn): New constraints.
>     * config/i386/i386.md (*movdi_internal): Change constraints of
>     operand 1 from Yi to Yj and from Ym to Yn.
>     (*movsi_internal): Ditto.
>     (*movdf_internal): Ditto.
>     (*movsf_internal): Ditto.
>     (*float<SWI48x:mode><X87MODEF:mode>2_1): Use
>     TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
>     (*float<SWI48x:mode><X87MODEF:mode>2_1 splitters): Ditto.
>     (floatdi<X87MODEF:mode>2_i387_with_xmm): Ditto.
>     (floatdi<X87MODEF:mode>2_i387_with_xmm splitters): Ditto.
>     * config/i386/sse.md (movdi_to_sse): Ditto.
>     (sse2_stored): Change constraint of operand 1 from Yi to Yj.
>     Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of
>     TARGET_INTER_UNIT_MOVES.
>     (sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj.
>     (sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC
>     instead of TARGET_INTER_UNIT_MOVES.
>     * config/i386/mmx.md (*mov<mode>_internal): Change constraint of
>     operand 1 from Yi to Yj and from Ym to Yn.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
> {,-m32} and committed to mainline SVN.
>
> Uros.



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]