Bug 30970 - Register zeroing by xor N,N should be moved out of loop
Summary: Register zeroing by xor N,N should be moved out of loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 minor
Target Milestone: 4.3.0
Assignee: Uroš Bizjak
URL:
Keywords: ssemmx
Depends on:
Blocks:
 
Reported: 2007-02-26 13:35 UTC by Uroš Bizjak
Modified: 2007-03-02 18:37 UTC (History)
2 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-02-26 15:48:01


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2007-02-26 13:35:25 UTC
The testcase:

--cut here--
#define N 256
int b[N];

void test()
{  
  int i;

  for (i = 0; i < N; i++)
    b[i] = 0;
}
--cut here--

compiles with '-O2 -msse2 -ftree-vectorize' into:

test:
        movl    $16, %eax
        pxor    %xmm0, %xmm0
        movdqa  %xmm0, b
        .p2align 4,,7
.L2:
        pxor    %xmm0, %xmm0
        movdqa  %xmm0, b(%eax)
        addl    $16, %eax
        cmpl    $1024, %eax
        jne     .L2
        rep ; ret

Please note second pxor that is _not_ needed. Also, it should be moved out of loop as it is loop invariant.

For slightly different testcase, where 'b[i] = 1' (or anything != 0) we get optimized code:

test:
        movl    $16, %eax
        movdqa  .LC0, %xmm0
        movdqa  %xmm0, b
        .p2align 4,,7
.L2:
        movdqa  %xmm0, b(%eax)
        addl    $16, %eax
        cmpl    $1024, %eax
        jne     .L2
        rep ; ret

It looks like (g)cse doesn't know what 'xor N,N' means.
Comment 1 Uroš Bizjak 2007-02-26 15:48:01 UTC
It is a target issue. Working on a fix.
Comment 2 Richard Biener 2007-02-26 17:35:44 UTC
Shouldn't rtl invariant motion catch this?
Comment 3 Uroš Bizjak 2007-02-26 19:51:03 UTC
(In reply to comment #2)
> Shouldn't rtl invariant motion catch this?

It would be nice, but the problem is again in the fact that we lie to the compiler about supported instructions. This one is not a valid x86 insn:

(insn 12 8 13 2 (set (mem:V4SI (symbol_ref:DI ("b") <var_decl 0x2aaaae15b000 b>) [3 S16 A128])
        (const_vector:V4SI [
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
            ])) 919 {*movv4si_internal} (nil)
    (nil))

This sequence is later split into pxor+store, unfortunatelly a bit late in the game, after RTL optimizers have already done their job.
Comment 4 uros 2007-02-27 21:27:38 UTC
Subject: Bug 30970

Author: uros
Date: Tue Feb 27 21:27:27 2007
New Revision: 122387

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122387
Log:
        PR target/30970
        * config/i386/sse.md (*mov<mode>_internal, *movv4sf_internal,
        *movv2df_internal): Enable pattern only for valid operand
        combinations.
        * config/i386/i386.c (ix86_modes_tieable_p): For SSE registers,
        tie only 128bit modes. For MMX registers, tie only 64bit modes.

testsuite/ChangeLog:

	PR target/30970
	* gcc.target/i386/gfortran.dg/pr30970.c: New test. 


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr30970.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 5 Uroš Bizjak 2007-03-02 14:54:33 UTC
Fixed in mainline.