Bug 66852 - vmovdqa instructions issued on 64-bit aligned array, causes segfault
Summary: vmovdqa instructions issued on 64-bit aligned array, causes segfault
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.9.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-13 01:44 UTC by Jeffrey Walton
Modified: 2023-05-06 02:14 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-07-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey Walton 2015-07-13 01:44:03 UTC
My apologies for *not* having a minimum working example. Sometimes its hard to craft them, and this is one of those times.

The C++ code below causes a segfault on [relative] line 10 when using GCC 4.9/x86_64 with -O3. Line 10 is:

    ((word64*)buf)[i] ^= ((word64*)mask)[i];

From a disassembly, here's the offending code:

   0x0000000000539fae <+206>:   vmovdqu (%rcx,%r10,1),%xmm0
   0x0000000000539fb4 <+212>:   vinsertf128 $0x1,0x10(%rcx,%r10,1),%ymm0,%ymm0
   0x0000000000539fbc <+220>:   vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0
=> 0x0000000000539fc3 <+227>:   vmovdqa %ymm0,0x0(%r13,%r10,1)

Looking at vmovdqa requirements, it appears it requires 128-bit aligned words. However, the array starts as a 'byte*' (unsigned char) and then is cast depending on the alignment.

In this case, its cast to a 64-bit word pointer. Here is how word64 is defined:

    #if defined(_MSC_VER) || defined(__BORLANDC__)
        typedef unsigned __int64 word64;
        #define W64LIT(x) x##ui64
    #else
        typedef unsigned long long word64;
        #define W64LIT(x) x##ULL
    #endif

**********

One system:

    $ g++ --version
    g++ (Debian 4.9.2-10) 4.9.2

    $ uname -a
    Linux debian-8-x64 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux

**********

Same problem, another system:

    $ g++ --version
    g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)

    $ uname -a
    Linux localhost.localdomain 4.0.6-200.fc21.x86_64 #1 SMP Tue Jun 23 13:59:12 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

**********

I am able to tame the problem with the following, so I guess its a potential work around (though I'd be happy to get other suggestions):

#pragma GCC optimize push
#pragma GCC optimize ("-O2")

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    ...
}

#pragma GCC optimize pop

**********

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    size_t i;

    if (IsAligned<word32>(buf) && IsAligned<word32>(mask))
    {
        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && IsAligned<word64>(mask))
        {
            for (i=0; i<count/8; i++)
                ((word64*)buf)[i] ^= ((word64*)mask)[i];
            count -= 8*i;
            if (!count)
                return;
            buf += 8*i;
            mask += 8*i;
        }

        for (i=0; i<count/4; i++)
            ((word32*)buf)[i] ^= ((word32*)mask)[i];
        count -= 4*i;
        if (!count)
            return;
        buf += 4*i;
        mask += 4*i;
    }

    for (i=0; i<count; i++)
        buf[i] ^= mask[i];
}
Comment 1 Jeffrey Walton 2015-07-13 02:31:57 UTC
This also appears to be an issue with GCC 4.8 and 5.1. See https://groups.google.com/d/msg/cryptopp-users/BlPiQ2eAWhg/IsX18wAT9ZAJ.
Comment 2 Jeffrey Walton 2015-07-13 02:42:50 UTC
> My apologies for *not* having a minimum working example.

If you want to duplicate it, then:

    git clone https://github.com/weidai11/cryptopp.git

Open GNUmakefile, and change to (around line 3):

    OPTIMIZE ?= -O3

Depending on when we checkin the fix, you may (or may not) experience the bug. I don't know how to tell git to checkout a particular revision because it does not use them. I guess you should do something to get a version of the sources prior to Sunday, July 12, 2015.

If the proposed fix is applied, then remove the optimize pragma from misc.cpp (begins around line 20):

#pragma GCC push_options
#pragma GCC optimize ("-O2")

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    ...
}

#pragma GCC pop_options
Comment 3 Richard Biener 2015-07-13 08:49:58 UTC
Please at least attach preprocessed source of the file containing xorbuf ()
Comment 4 Markus Trippelsdorf 2015-07-13 09:20:35 UTC
If you build with -fsanitize=undefined you'll see:

algparam.h:204:87: runtime error: reference binding to null pointer of type 'const struct Integer'
algparam.h:217:88: runtime error: reference binding to null pointer of type 'const struct Integer'
algparam.h:220:88: runtime error: reference binding to null pointer of type 'const struct Integer'
crc.cpp:134:28: runtime error: load of misaligned address 0x00000127df9f for type 'const word32', which requires 4 byte alignment
filters.cpp:280:39: runtime error: null pointer passed as argument 1, which is declared to never be null
filters.cpp:280:39: runtime error: null pointer passed as argument 2, which is declared to never be null
filters.cpp:281:50: runtime error: null pointer passed as argument 1, which is declared to never be null
filters.cpp:281:50: runtime error: null pointer passed as argument 2, which is declared to never be null
filters.cpp:291:28: runtime error: null pointer passed as argument 1, which is declared to never be null
filters.cpp:518:84: runtime error: null pointer passed as argument 1, which is declared to never be null
filters.cpp:518:84: runtime error: null pointer passed as argument 2, which is declared to never be null
filters.cpp:676:35: runtime error: null pointer passed as argument 2, which is declared to never be null
misc.cpp:101:29: runtime error: load of misaligned address 0x7ffdab82d529 for type 'word32', which requires 4 byte alignment
misc.cpp:101:50: runtime error: load of misaligned address 0x00000127df95 for type 'word32', which requires 4 byte alignment
misc.cpp:26:43: runtime error: load of misaligned address 0x7ffdab82c713 for type 'word64', which requires 8 byte alignment
misc.cpp:35:43: runtime error: load of misaligned address 0x000003c65c4a for type 'word32', which requires 4 byte alignment
misc.cpp:56:68: runtime error: store to misaligned address 0x000003c67653 for type 'word64', which requires 8 byte alignment
misc.cpp:66:67: runtime error: store to misaligned address 0x000003c67651 for type 'word32', which requires 4 byte alignment
misc.cpp:91:30: runtime error: load of misaligned address 0x7ffdab82d511 for type 'word64', which requires 8 byte alignment
misc.h:1163:31: runtime error: load of misaligned address 0x00000127d8aa for type 'const unsigned int', which requires 4 byte alignment
misc.h:1181:2: runtime error: load of misaligned address 0x000003c67669 for type 'const unsigned int', which requires 4 byte alignment
misc.h:177:26: runtime error: null pointer passed as argument 2, which is declared to never be null
misc.h:623:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
misc.h:635:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
misc.h:641:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
misc.h:962:23: runtime error: load of misaligned address 0x7ffdab82c702 for type 'const long long unsigned int', which requires 8 byte alignment
panama.cpp:371:11: runtime error: load of misaligned address 0x7ffdab82c716 for type 'const word32', which requires 4 byte alignment
panama.cpp:371:18: runtime error: load of misaligned address 0x7ffdab82c71a for type 'const word32', which requires 4 byte alignment
panama.cpp:371:25: runtime error: load of misaligned address 0x7ffdab82c71e for type 'const word32', which requires 4 byte alignment
... etc.
Comment 5 Markus Trippelsdorf 2015-07-13 09:28:49 UTC
See PR65709 for a similar issue.
Comment 6 Richard Biener 2015-07-13 09:30:20 UTC
So I suppose the IsAligned template is wrong.
Comment 7 Markus Trippelsdorf 2015-07-13 09:58:46 UTC
(In reply to Richard Biener from comment #6)
> So I suppose the IsAligned template is wrong.

Yes.

 390 template <class T>                                                                                                                                                       
 391 inline unsigned int GetAlignmentOf(T *dummy=NULL)       // VC60 workaround                                                                                               
 392 {                                                                                                                                                                        
 393 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS                                                                                                                              
 394         if (sizeof(T) < 16)                                                                                                                                              
 395                 return 1;                                                                                                                                                
 396 #endif                                                                                                                                                                   
 397                                                                                                                                                                          
 398 #if (_MSC_VER >= 1300)                                                                                                                                                   
 399         return __alignof(T);                                                                                                                                             
 400 #elif defined(__GNUC__)                                                                                                                                                  
 401         return __alignof__(T);                                                                                                                                           
 402 #elif CRYPTOPP_BOOL_SLOW_WORD64                                                                                                                                          
 403         return UnsignedMin(4U, sizeof(T));                                                                                                                               
 404 #else                                                                                                                                                                    
 405         return sizeof(T);                                                                                                                                                
 406 #endif                                                                                                                                                                   
 407 }                                                                                                                                                                        
 408                                                                                                                                                                          
 409 inline bool IsAlignedOn(const void *p, unsigned int alignment)                                                                                                           
 410 {                                                                                                                                                                        
 411         return alignment==1 || (IsPowerOf2(alignment) ? ModPowerOf2((size_t)p, alignment) == 0 : (size_t)p % alignment == 0);                                            
 412 }                                                                                                                                                                        
 413                                                                                                                                                                          
 414 template <class T>                                                                                                                                                       
 415 inline bool IsAligned(const void *p, T *dummy=NULL)     // VC60 workaround                                                                                               
 416 {                                                                                                                                                                        
 417         return IsAlignedOn(p, GetAlignmentOf<T>());                                                                                                                      
 418 } 

and 
345 #if CRYPTOPP_BOOL_X64 || CRYPTOPP_BOOL_X86 || defined(__powerpc__)                                                                                                        
346         #define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS                                                                                                                      
347 #endif 

The only architecture where the assumption is valid is POWER8, where
unaligned vector loads are preferable to ones with forced-alignment.
Comment 8 Jonathan Wakely 2015-07-13 10:13:49 UTC
The code in algparam.h is just disgusting. AssignFromHelperClass binds a reference to NULL just to default-construct a temporary of some type, then binds a const-reference to that temporary, then casts away const to modify the value of that temporary. What seems to be a "VC60 workaround" introduces undefined behaviour by trying to be far too clever. Apparently also too clever to use compiler warnings.
Comment 9 Jeffrey Walton 2015-07-13 10:29:08 UTC
(In reply to Jonathan Wakely from comment #8)
> The code in algparam.h is just disgusting. AssignFromHelperClass binds a
> reference to NULL just to default-construct a temporary of some type, then
> binds a const-reference to that temporary, then casts away const to modify
> the value of that temporary. What seems to be a "VC60 workaround" introduces
> undefined behaviour by trying to be far too clever. Apparently also too
> clever to use compiler warnings.

Lol...

Yeah, some of the legacy code is awful. I'm not throwing stones because the library supported so many compilers and IDEs back then.

I think its time to start cleaning up some of the cruft. I'm going to cite this comment when I propose some of the changes.
Comment 10 Jeffrey Walton 2015-07-13 10:34:03 UTC
(In reply to Richard Biener from comment #6)
> So I suppose the IsAligned template is wrong.

So I'm clear (please forgive my ignorance)...

According to http://www.felixcloutier.com/x86/MOVDQA.html, vmovdqa requires values double quad word alignment (16-byte). A word64 is aligned on 8-bytes, and does not meet the precondition to use the instruction. Naively, that seems like a problem to me.

Will that create problems with GCC and vectorizations?
Comment 11 Jeffrey Walton 2015-07-13 10:44:36 UTC
(In reply to Jonathan Wakely from comment #8)
> The code in algparam.h is just disgusting. AssignFromHelperClass binds a
> reference to NULL just to default-construct a temporary of some type, then
> binds a const-reference to that temporary, then casts away const to modify
> the value of that temporary. What seems to be a "VC60 workaround" introduces
> undefined behaviour by trying to be far too clever. Apparently also too
> clever to use compiler warnings.

The project is finally using -Wall, and its not flagging any issues with that code.

I know -Wdelete-non-virtual-dtor is a problem (and potential security problem), and I'm trying to figure how how to proceed. Cf.https://groups.google.com/d/msg/cryptopp-users/__m0euOdbEQ/tvINItctzjAJ, 

What additional warnings would you suggest?
Comment 12 Markus Trippelsdorf 2015-07-13 10:46:14 UTC
(In reply to Jeffrey Walton from comment #10)
> (In reply to Richard Biener from comment #6)
> > So I suppose the IsAligned template is wrong.
> 
> So I'm clear (please forgive my ignorance)...
> 
> According to http://www.felixcloutier.com/x86/MOVDQA.html, vmovdqa requires
> values double quad word alignment (16-byte). A word64 is aligned on 8-bytes,
> and does not meet the precondition to use the instruction. Naively, that
> seems like a problem to me.
> 
> Will that create problems with GCC and vectorizations?

See PR65709 Comment 9 and followups.
Comment 13 Jeffrey Walton 2015-07-14 02:54:50 UTC
A quick update....

We did out best to take the advice of Jakub, Janathan, Markus and others: https://github.com/weidai11/cryptopp/commit/9bf0eed0f6ff6d0b0a2d277e5401d69dc8c0e394.

We are paying for past transgressions, and cleanup is not as easy as we would like. Eventually, we want to enable CRYPTOPP_NO_UNALIGNED_DATA_ACCESS by default. It removes all related undefined behavior flagged by UBsan. The only thing stopping us is the politics of such a move.