30970 – Register zeroing by xor N,N should be moved out of loop

Bug 30970 - Register zeroing by xor N,N should be moved out of loop

Summary: Register zeroing by xor N,N should be moved out of loop

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.3.0

Importance:	P3 minor
Target Milestone:	4.3.0
Assignee:	Uroš Bizjak

URL:
Keywords:	ssemmx

Depends on:
Blocks:

Reported:	2007-02-26 13:35 UTC by Uroš Bizjak
Modified:	2007-03-02 18:37 UTC (History)
CC List:	2 users (show)

See Also:
Host:	i686-pc-linux-gnu
Target:	i686-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2007-02-26 15:48:01

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Uroš Bizjak 2007-02-26 13:35:25 UTC

The testcase:

--cut here--
#define N 256
int b[N];

void test()
{  
  int i;

  for (i = 0; i < N; i++)
    b[i] = 0;
}
--cut here--

compiles with '-O2 -msse2 -ftree-vectorize' into:

test:
        movl    $16, %eax
        pxor    %xmm0, %xmm0
        movdqa  %xmm0, b
        .p2align 4,,7
.L2:
        pxor    %xmm0, %xmm0
        movdqa  %xmm0, b(%eax)
        addl    $16, %eax
        cmpl    $1024, %eax
        jne     .L2
        rep ; ret

Please note second pxor that is _not_ needed. Also, it should be moved out of loop as it is loop invariant.

For slightly different testcase, where 'b[i] = 1' (or anything != 0) we get optimized code:

test:
        movl    $16, %eax
        movdqa  .LC0, %xmm0
        movdqa  %xmm0, b
        .p2align 4,,7
.L2:
        movdqa  %xmm0, b(%eax)
        addl    $16, %eax
        cmpl    $1024, %eax
        jne     .L2
        rep ; ret

It looks like (g)cse doesn't know what 'xor N,N' means.

Comment 1 Uroš Bizjak 2007-02-26 15:48:01 UTC

It is a target issue. Working on a fix.

Comment 2 Richard Biener 2007-02-26 17:35:44 UTC

Shouldn't rtl invariant motion catch this?

Comment 3 Uroš Bizjak 2007-02-26 19:51:03 UTC

(In reply to comment #2)
> Shouldn't rtl invariant motion catch this?

It would be nice, but the problem is again in the fact that we lie to the compiler about supported instructions. This one is not a valid x86 insn:

(insn 12 8 13 2 (set (mem:V4SI (symbol_ref:DI ("b") <var_decl 0x2aaaae15b000 b>) [3 S16 A128])
        (const_vector:V4SI [
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
            ])) 919 {*movv4si_internal} (nil)
    (nil))

This sequence is later split into pxor+store, unfortunatelly a bit late in the game, after RTL optimizers have already done their job.

Comment 4 uros 2007-02-27 21:27:38 UTC

Subject: Bug 30970

Author: uros
Date: Tue Feb 27 21:27:27 2007
New Revision: 122387

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122387
Log:
        PR target/30970
        * config/i386/sse.md (*mov<mode>_internal, *movv4sf_internal,
        *movv2df_internal): Enable pattern only for valid operand
        combinations.
        * config/i386/i386.c (ix86_modes_tieable_p): For SSE registers,
        tie only 128bit modes. For MMX registers, tie only 64bit modes.

testsuite/ChangeLog:

	PR target/30970
	* gcc.target/i386/gfortran.dg/pr30970.c: New test. 


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr30970.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 5 Uroš Bizjak 2007-03-02 14:54:33 UTC

Fixed in mainline.