Bug 36054 - bad code generation with -ftree-vectorize
Summary: bad code generation with -ftree-vectorize
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 36159
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2008-04-26 16:25 UTC by tim blechmann
Modified: 2021-08-23 23:09 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
preprocessed source file (562.26 KB, application/octet-stream)
2008-04-26 16:27 UTC, tim blechmann
Details
precompiled source (34.22 KB, text/plain)
2009-02-13 15:25 UTC, Dmitry Gorbachev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tim blechmann 2008-04-26 16:25:59 UTC
hi all,

(first of all, sorry for this unprofessional bug report)

compiling my application with gcc-4.3 with -O2 -ftree-vectorize, it segfaults. i haven't been able to write a stripped-down test case, but here are the information, that i gathered:

the constructor of the main data structure of my application contains:
0xb5eb02bd <Environment+205>:	movdqa %xmm0,0x1990(%edi)
0xb5eb02c5 <Environment+213>:	movdqa %xmm0,0x19a0(%edi)
0xb5eb02cd <Environment+221>:	movdqa %xmm0,0x19b0(%edi)
0xb5eb02d5 <Environment+229>:	movdqa %xmm0,0x19c0(%edi)
0xb5eb02dd <Environment+237>:	movdqa %xmm0,0x19d0(%edi)
0xb5eb02e5 <Environment+245>:	movdqa %xmm0,0x19e0(%edi)
0xb5eb02ed <Environment+253>:	movdqa %xmm0,0x19f0(%edi)
0xb5eb02f5 <Environment+261>:	movdqa %xmm0,0x1a00(%edi)

where %edi contains the this pointer to the class. the problem is, that the address, that Environment+205 tries to load seems not to be guarrantied to be aligned to a 16 byte boundary. 

in my debugging session, it pointed to 0x8369f58, 0x8369f58+0x1990 is not aligned as required by the movdqa instruction, though ...

i am using gcc-4.3:
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure linux gnu
Thread model: posix
gcc version 4.3.1 20080401 (prerelease) (Debian 4.3.0-3) 

the command line options are: -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC 

unfortunately i haven't been able to construct a smaller test-case ... gcc-4.2 works fine for me ...

the preprocessed source file is attached
Comment 1 tim blechmann 2008-04-26 16:27:35 UTC
Created attachment 15533 [details]
preprocessed source file

preprocessed source file
Comment 2 victork 2008-04-30 02:48:40 UTC
Hello,

I've tried to complile the attached kernel_build.ii on my SUSE SLES 10 x86_64 machine, but got a bunch of compile errors like this:

In file included from /usr/lib/gcc/i486-linux-gnu/4.3.1/include/xmmintrin.h:40,
                 from source/dsp/simd_sse.hpp:24,
                 from source/dsp/arithmetic_wrapper.hpp:32,
                 from source/kernel/audio_backend/audio_backend.hpp:35,
                 from source/kernel/scheduler.hpp:33,
                 from source/kernel/class_system/classloader.hpp:33,
                 from source/kernel/class_system/class.hpp:33,
                 from source/kernel/object_system/gobj.hpp:31,
                 from source/nova_includes.hpp:31,
                 from source/nova.hpp:30,
                 from /home/tim/workspace/nova/source/kernel/audio_backend/audio_backend.cpp:27,
                 from /home/tim/workspace/nova/release/kernel/audio_backend/build.cpp:2,
                 from release/kernel/kernel_build.cpp:2:
/usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function &#1074;&#1026;int __vector__ _mm_add_si64(int __vector__, int __vector__)&#1074;&#1026;™:
/usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert &#1074;&#1026;long long int&#1074;&#1026;™ to &#1074;&#1026;long long int __vector__&#1074;&#1026;™ for argument &#1074;&#1026;1&#1074;&#1026;™ to &#1074;&#1026;long long int __vector__ __builtin_ia32_paddq(long long int __vector__, long long int __vector__)&#1074;&#1026;™


Anyway. I see that recently were fixed some problems related to stack alignment like PR35496. Can you try to compile your example with latest gcc 4.4 compiler from mainline?
Comment 3 tim blechmann 2008-04-30 08:40:50 UTC
have you tried to compile with -march=core2 -mfpmath=sse -msse? i guess, that is required to compile the preprocessed source file correctly ...

i will try gcc-4.4, when i find the time to compile it ...
Comment 4 victork 2008-04-30 09:44:03 UTC
> have you tried to compile with -march=core2 -mfpmath=sse -msse?
Yes, I've compiled it as following:
% g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii
Comment 5 tim blechmann 2008-04-30 09:58:43 UTC
odd, it compiled fine for me:

tim@laptop:~/workspace/nova$ g++-4.3 -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096  -Wnon-virtual-dtor -fPIC kernel_build.ii -c
tim@laptop:~/workspace/nova$ 

tim@laptop:~/workspace/nova$ g++-4.3 -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure linux gnu
Thread model: posix
gcc version 4.3.1 20080401 (prerelease) (Debian 4.3.0-3) 
Comment 6 Uroš Bizjak 2008-04-30 10:37:11 UTC
(In reply to comment #4)
> > have you tried to compile with -march=core2 -mfpmath=sse -msse?
> Yes, I've compiled it as following:
> % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096
> -Wnon-virtual-dtor -fPIC kernel_build.ii

-m32 ? 
Comment 7 victork 2008-04-30 10:51:56 UTC
> -m32?

Better, but a bunch of 45 errors like below remained.

% g++  -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii > log 2>&1

/usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function &#1074;&#1026;int __vector__ _mm_add_si64(int __vector__, int __vector__)&#1074;&#1026;™:
/usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert &#1074;&#1026;long long int&#1074;&#1026;™ to &#1074;&#1026;long long int __vector__&#1074;&#1026;™ for argument &#1074;&#1026;1&#1074;&#1026;™ to &#1074;&#1026;long long int __vector__ __builtin_ia32_paddq(long long int __vector__, long long int __vector__)&#1074;&#1026;™
Comment 8 Joey Ye 2008-04-30 10:53:13 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > > have you tried to compile with -march=core2 -mfpmath=sse -msse?
> > Yes, I've compiled it as following:
> > % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096
> > -Wnon-virtual-dtor -fPIC kernel_build.ii
> -m32 ? 

-m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change of ia32 intrinsic conflict with 4.3.0 header files.

I'm using 4.3.0. Compilation passes but I still got link errors like:
/tmp/ccfJXXcV.o:(.rodata._ZTVN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[vtable for portaudio::MemFunCallbackStream<nova::PortAudioBackend>]+0x10): undefined reference to `portaudio::Stream::close()'
/tmp/ccfJXXcV.o:(.rodata._ZTIN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[typeinfo for portaudio::MemFunCallbackStream
Comment 9 Joey Ye 2008-04-30 10:56:59 UTC
(In reply to comment #8)
> -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change
Correction: -m32 is a must, but doesn't fix all. Options I'm using:
 g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor  -m32 
Comment 10 Uroš Bizjak 2008-04-30 12:31:49 UTC
(In reply to comment #7)
> > -m32?
> 
> Better, but a bunch of 45 errors like below remained.
> 
> % g++  -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096
> -Wnon-virtual-dtor -fPIC kernel_build.ii > log 2>&1
> 
> /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function
> &#1074;&#1026;int __vector__ _mm_add_si64(int __vector__, int
> __vector__)&#1074;&#1026;™:
> /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert
> &#1074;&#1026;long long int&#1074;&#1026;™ to &#1074;&#1026;long long int
> __vector__&#1074;&#1026;™ for argument &#1074;&#1026;1&#1074;&#1026;™ to
> &#1074;&#1026;long long int __vector__ __builtin_ia32_paddq(long long int
> __vector__, long long int __vector__)&#1074;&#1026;™

Just remove these functions (64bit MMX arithmetic and MMX shift builtins) from the file. You are compiling with 4.4 that has changed these intrinsic functions (and relevant builtins). The mismatch is created since this preprocessed source includes old (4.3.1) headers.

Anyway, I can't check if 4.4 generates correct code since the test isn't executable. For a runtime problems, executable test is needed, and since the failure happens in Environment function, only this function should be present in the executable.
Comment 11 Joey Ye 2008-05-01 04:31:29 UTC
Tim,

Since it doesn't link, I can only check the .s file. There are a couple of constructor called Environment, which one is the problemetic function?

grep Environment kernel_build.s|grep glob
...
.globl _ZN4nova11EnvironmentD1Ev
.globl _ZN4nova11EnvironmentD2Ev
.globl _ZN4nova11EnvironmentC1Ev
Comment 12 tim blechmann 2008-05-01 13:14:17 UTC
> Since it doesn't link, I can only check the .s file. There are a couple of
> constructor called Environment, which one is the problemetic function?

sorry for not providing a stripped down test case ...
the specific constructor is the default constructor:
_ZN4nova11EnvironmentC1Ev

in my version of the assembler file, the specific line is:
	movdqa	%xmm0, 6544(%esi)

i hope, this is of any help ...

thanks, tim 
Comment 13 Joey Ye 2008-05-05 07:22:49 UTC
It is helpful. Root cause is that memory allocated by new is only aligned to 8 bytes under i386. In your case, object Environment is allocated by new and its constructor tried to use movdqa to initialize its members. Following small case shows the problem:
/* Compile with option -m32 -msse2 
   Current behavior: runtime segment fault
 */
#include <stdio.h>
#include <emmintrin.h>

struct A {
public:
	__m128i m;
	void init() { m = _mm_setzero_si128(); }
};

int main()
{
	A * a = new A;
	printf("Address of A: %p\n", a);
	a->init();
	delete a;
	return 0;
}
Comment 14 Joey Ye 2008-05-05 07:29:33 UTC
HJ, 

AVX will have the similar problem on x86_64, whose new only returns object aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32 bytes boundary.


Comment 15 Andrew Pinski 2008-05-05 07:49:00 UTC
See http://gcc.gnu.org/ml/gcc/2006-10/msg00166.html and a couple others about a way to have an aligned operator new.
Comment 16 Andrew Pinski 2008-05-05 07:50:19 UTC
malloc has the same issue really.  And from what I heard last time, glibc does not want to change malloc to return the alignment.
Comment 17 Jakub Jelinek 2008-05-05 08:53:29 UTC
Comment #c16 doesn't make sense.  Of course malloc(3) can't be changed to return 
alignment, that would break all programs out there, violate many standards, etc.
There are posix_memalign and memalign, which work just fine.
Comment 18 tim blechmann 2008-05-05 09:09:28 UTC
hm, if that code is broken, what about the following one on x86_64 (__sync_bool_compare_and_swap_16 requires an alignment of 16 byte)?

struct __attribute__((aligned(16))) foo_t
{
    foo_t(int a = 0, int b = 0): a_(a), b_(b) {}

    void CAS (foo_t const & rhs)
    {
        typedef int TItype __attribute__ ((mode (TI)));

        union cu
        {
            long c[2];
            TItype l;
        };

        cu old;
        old.c[0] = (long)a_;
        old.c[1] = (long)b_;

        cu nw;
        nw.c[0] = (long)rhs.a_;
        nw.c[1] = (long)rhs.b_;


        __sync_bool_compare_and_swap_16(reinterpret_cast<volatile foo_t*>(this), old.l, nw.l);
    }

    int a_, b_;
};

int main()
{
    foo_t * f = new foo_t();
    f->CAS(foo_t(1,2));
}

thanks, tim
Comment 19 Andrew Pinski 2008-05-05 09:14:29 UTC
(In reply to comment #17)
> Comment #c16 doesn't make sense.  Of course malloc(3) can't be changed to
> return 
> alignment, that would break all programs out there, violate many standards,
> etc.

Right now malloc violates the C standard with respect of alignment.  I am not saying we should add an alignment argument to malloc but if the standard there is an alignment on the returned value, likewise for operator new.

-- Pinski
Comment 20 H.J. Lu 2008-05-06 16:10:39 UTC
It is a bug in testcase. But g++ doesn't diagnose it properly. I opened
PR 36159 for it.
Comment 21 Dmitry Gorbachev 2009-02-13 15:25:54 UTC
Created attachment 17293 [details]
precompiled source

The same issue with GCC 4.3.3 (i686-pc-linux-gnu).

C source: http://svn.savannah.gnu.org/viewvc/*checkout*/trunk/kqemu.c?root=qemu&revision=6338&content-type=text%2Fplain

cc1 -O3 -march=pentium4 kqemu.c

Offending asm code:

kqemu_cpu_exec:
...
        movl    8(%ebp), %ebx
...
.L96:
        movdqa  (%ebx), %xmm0  // segmentation fault
Comment 22 Dmitry Gorbachev 2009-04-20 19:31:40 UTC
(In reply to comment #21)

Sorry, it seems it's because of malloc(), not a GCC bug...