33353 – Vector RTL arithmetic operations with constant arguments are not fully folded.

Bug 33353 - Vector RTL arithmetic operations with constant arguments are not fully folded.

Summary: Vector RTL arithmetic operations with constant arguments are not fully folded.

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.3.0

Importance:	P3 enhancement
Target Milestone:	4.6.2
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-09-08 12:36 UTC by Uroš Bizjak
Modified:	2011-08-27 08:11 UTC (History)
CC List:	2 users (show)

See Also:
Host:	x86_64-pc-linux-gnu
Target:	x86_64-pc-linux-gnu
Build:	x86_64-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:	2007-09-09 17:41:42

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Uroš Bizjak 2007-09-08 12:36:19 UTC

Following testcase from PR target/33329 shows the problem where gcc doesn't fold vector arithmetic operations with constant arguments to a load of vector constant. 

For clarity, sse4 will be used, but the same problem is present on sse2.

--cut here--
extern void g (int *);

void f (void)
{
  int tabs[8], tabcount;

  for (tabcount = 1; tabcount <= 8; tabcount += 7)
    {
      int i;
      for (i = 0; i < 8; i++)
        tabs[i] = 2 * i;
      g (tabs);
    }
}
--cut here--

produces (gcc -O2 -msse4 -ftree-vectorize):

.LCFI2:
        movdqa  .LC0(%rip), %xmm1
        leaq    16(%rsp), %rbp
        movdqa  .LC1(%rip), %xmm0
        paddd   .LC2(%rip), %xmm1
        pmulld  %xmm1, %xmm0    # 19    *sse4_1_mulv4si3        [length = 4]
        movdqa  %xmm0, (%rsp)
.L2:
        movdqa  .LC3(%rip), %xmm0       # 54
        movq    %rbp, %rdi
        addl    $1, %ebx
        movdqa  (%rsp), %xmm2   # 55
        movdqa  %xmm0, (%rbp)
        movdqa  %xmm2, 16(%rbp)
        call    g
        cmpl    $2, %ebx
        jne     .L2

All instructions above the loop have constant arguments. This is evident from combine RTL dump, where insn 19 is represented using following RTX:

(insn 19 17 25 2 pr33329.c:13 (set (reg:V4SI 78)
        (mult:V4SI (reg:V4SI 77)
            (reg:V4SI 73))) 1136 {*sse4_1_mulv4si3} (expr_list:REG_DEAD (reg:V4S
I 73)
        (expr_list:REG_EQUAL (const_vector:V4SI [
                    (const_int 8 [0x8])
                    (const_int 10 [0xa])
                    (const_int 12 [0xc])
                    (const_int 14 [0xe])
                ])
            (nil))))

Actually gcc already calculated correct const_vector value, but it looks like it doesn't know what to do with it. For optimal code, insn #55 should load vector constant from the constant pool in the same way as insn #54.

Comment 1 Uroš Bizjak 2011-08-27 08:11:22 UTC

This is fixed at least from gcc version 4.6.2 20110827 (prerelease) onward:

f:
.LFB0:
	.cfi_startproc
	subq	$40, %rsp
	.cfi_def_cfa_offset 48
	movq	%rsp, %rdi
	movl	$0, (%rsp)
	movl	$2, 4(%rsp)
	movl	$4, 8(%rsp)
	movl	$6, 12(%rsp)
	movl	$8, 16(%rsp)
	movl	$10, 20(%rsp)
	movl	$12, 24(%rsp)
	movl	$14, 28(%rsp)
	call	g
	movq	%rsp, %rdi
	movl	$0, (%rsp)
	movl	$2, 4(%rsp)
	movl	$4, 8(%rsp)
	movl	$6, 12(%rsp)
	movl	$8, 16(%rsp)
	movl	$10, 20(%rsp)
	movl	$12, 24(%rsp)
	movl	$14, 28(%rsp)
	call	g
	addq	$40, %rsp
	.cfi_def_cfa_offset 8
	ret