Bug 37774 - [4.4 Regression] Alignment information is lost for ARRAY_REFs
Summary: [4.4 Regression] Alignment information is lost for ARRAY_REFs
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.0
: P3 normal
Target Milestone: 4.4.0
Assignee: Jakub Jelinek
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-10-08 17:08 UTC by H.J. Lu
Modified: 2008-10-09 11:29 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-10-08 20:51:43


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2008-10-08 17:08:58 UTC
Gcc 4.4 generates an extra load in a loop:

[hjl@gnu-6 gcc]$ cat /tmp/b.c 
#include <tmmintrin.h>

extern __m128i src[10];
extern __m128i resdst[10];

void
foo (void)
{
  int i;

  for (i = 0; i < 10; i++)
    resdst[i] = _mm_abs_epi16 (src[i]);
}
[hjl@gnu-6 gcc]$ gcc -O2 -S /tmp/b.c -o old.s -mssse3 -fno-asynchronous-unwind-tables
[hjl@gnu-6 gcc]$ gcc --version
gcc (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8)
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[hjl@gnu-6 gcc]$ cat old.s
	.file	"b.c"
	.text
	.p2align 4,,15
.globl foo
	.type	foo, @function
foo:
	xorl	%eax, %eax
	.p2align 4,,10
	.p2align 3
.L2:
	pabsw	src(%rax), %xmm0
	movdqa	%xmm0, resdst(%rax)
	addq	$16, %rax
	cmpq	$160, %rax
	jne	.L2
	rep
	ret
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-6 gcc]$  ./xgcc -B./ -O2 -mssse3 -S /tmp/b.c -fno-asynchronous-unwind-tables
[hjl@gnu-6 gcc]$ cat b.s
	.file	"b.c"
	.text
	.p2align 4,,15
.globl foo
	.type	foo, @function
foo:
	xorl	%eax, %eax
	.p2align 4,,10
	.p2align 3
.L2:
	movdqu	src(%rax), %xmm0
	pabsw	%xmm0, %xmm0
	movdqu	%xmm0, resdst(%rax)
	addq	$16, %rax
	cmpq	$160, %rax
	jne	.L2
	rep
	ret
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.4.0 20081006 (experimental) [trunk revision 140917]"

There are 2 problems:

1. Alignment info is lost and unaligned load is generated.
2. The load isn't needed at all.
Comment 1 Richard Biener 2008-10-08 20:06:37 UTC
How is the load not needed?
Comment 2 Andrew Pinski 2008-10-08 20:18:35 UTC
Just the alignment information is lost really:
(mem/s:V16QI (plus:SI (reg/f:SI 68)
                (reg:SI 63 [ ivtmp.68 ])) [4 resdst S16 A8])

Which I think is fixed via http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00325.html .

The load is needed.

If we use a pointer instead of an array we get:
L2:
        pabsw   (%ecx,%eax), %xmm0
        movdqa  %xmm0, (%edx,%eax)
        addl    $16, %eax
        cmpl    $160, %eax
        jne     L2

Note since __m128i has the attribute of may_alias you have to do the load of the global pointer before the loop.
Comment 3 Jakub Jelinek 2008-10-08 20:51:43 UTC
Newer patch http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00350.html
Comment 4 H.J. Lu 2008-10-08 20:55:05 UTC
(In reply to comment #3)
> Newer patch http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00350.html
> 

With this patch, I got

.globl foo
	.type	foo, @function
foo:
	xorl	%eax, %eax
	.p2align 4,,10
	.p2align 3
.L2:
	pabsw	src(%rax), %xmm0
	movdqa	%xmm0, resdst(%rax)
	addq	$16, %rax
	cmpq	$160, %rax
	jne	.L2
	rep
	ret

The load is combined into pabsw. The extra load insn and unaligned move
are gone.
Comment 5 Jakub Jelinek 2008-10-09 08:18:33 UTC
Subject: Bug 37774

Author: jakub
Date: Thu Oct  9 08:17:08 2008
New Revision: 141003

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141003
Log:
	PR middle-end/37774
	* tree.h (get_object_alignment): Declare.
	* emit-rtl.c (set_mem_attributes_minus_bitpos): Call
	get_object_alignment if needed.
	* builtins.c (get_pointer_alignment): Move ADDR_EXPR operand handling
	to ...
	(get_object_alignment): ... here.  New function.  Try harder to
	determine alignment from get_inner_reference returned offset.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/emit-rtl.c
    trunk/gcc/tree.h

Comment 6 Jakub Jelinek 2008-10-09 11:29:11 UTC
Fixed.