My apologies for *not* having a minimum working example. Sometimes its hard to craft them, and this is one of those times. The C++ code below causes a segfault on [relative] line 10 when using GCC 4.9/x86_64 with -O3. Line 10 is: ((word64*)buf)[i] ^= ((word64*)mask)[i]; From a disassembly, here's the offending code: 0x0000000000539fae <+206>: vmovdqu (%rcx,%r10,1),%xmm0 0x0000000000539fb4 <+212>: vinsertf128 $0x1,0x10(%rcx,%r10,1),%ymm0,%ymm0 0x0000000000539fbc <+220>: vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0 => 0x0000000000539fc3 <+227>: vmovdqa %ymm0,0x0(%r13,%r10,1) Looking at vmovdqa requirements, it appears it requires 128-bit aligned words. However, the array starts as a 'byte*' (unsigned char) and then is cast depending on the alignment. In this case, its cast to a 64-bit word pointer. Here is how word64 is defined: #if defined(_MSC_VER) || defined(__BORLANDC__) typedef unsigned __int64 word64; #define W64LIT(x) x##ui64 #else typedef unsigned long long word64; #define W64LIT(x) x##ULL #endif ********** One system: $ g++ --version g++ (Debian 4.9.2-10) 4.9.2 $ uname -a Linux debian-8-x64 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux ********** Same problem, another system: $ g++ --version g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) $ uname -a Linux localhost.localdomain 4.0.6-200.fc21.x86_64 #1 SMP Tue Jun 23 13:59:12 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux ********** I am able to tame the problem with the following, so I guess its a potential work around (though I'd be happy to get other suggestions): #pragma GCC optimize push #pragma GCC optimize ("-O2") void xorbuf(byte *buf, const byte *mask, size_t count) { ... } #pragma GCC optimize pop ********** void xorbuf(byte *buf, const byte *mask, size_t count) { size_t i; if (IsAligned<word32>(buf) && IsAligned<word32>(mask)) { if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && IsAligned<word64>(mask)) { for (i=0; i<count/8; i++) ((word64*)buf)[i] ^= ((word64*)mask)[i]; count -= 8*i; if (!count) return; buf += 8*i; mask += 8*i; } for (i=0; i<count/4; i++) ((word32*)buf)[i] ^= ((word32*)mask)[i]; count -= 4*i; if (!count) return; buf += 4*i; mask += 4*i; } for (i=0; i<count; i++) buf[i] ^= mask[i]; }
This also appears to be an issue with GCC 4.8 and 5.1. See https://groups.google.com/d/msg/cryptopp-users/BlPiQ2eAWhg/IsX18wAT9ZAJ.
> My apologies for *not* having a minimum working example. If you want to duplicate it, then: git clone https://github.com/weidai11/cryptopp.git Open GNUmakefile, and change to (around line 3): OPTIMIZE ?= -O3 Depending on when we checkin the fix, you may (or may not) experience the bug. I don't know how to tell git to checkout a particular revision because it does not use them. I guess you should do something to get a version of the sources prior to Sunday, July 12, 2015. If the proposed fix is applied, then remove the optimize pragma from misc.cpp (begins around line 20): #pragma GCC push_options #pragma GCC optimize ("-O2") void xorbuf(byte *buf, const byte *mask, size_t count) { ... } #pragma GCC pop_options
Please at least attach preprocessed source of the file containing xorbuf ()
If you build with -fsanitize=undefined you'll see: algparam.h:204:87: runtime error: reference binding to null pointer of type 'const struct Integer' algparam.h:217:88: runtime error: reference binding to null pointer of type 'const struct Integer' algparam.h:220:88: runtime error: reference binding to null pointer of type 'const struct Integer' crc.cpp:134:28: runtime error: load of misaligned address 0x00000127df9f for type 'const word32', which requires 4 byte alignment filters.cpp:280:39: runtime error: null pointer passed as argument 1, which is declared to never be null filters.cpp:280:39: runtime error: null pointer passed as argument 2, which is declared to never be null filters.cpp:281:50: runtime error: null pointer passed as argument 1, which is declared to never be null filters.cpp:281:50: runtime error: null pointer passed as argument 2, which is declared to never be null filters.cpp:291:28: runtime error: null pointer passed as argument 1, which is declared to never be null filters.cpp:518:84: runtime error: null pointer passed as argument 1, which is declared to never be null filters.cpp:518:84: runtime error: null pointer passed as argument 2, which is declared to never be null filters.cpp:676:35: runtime error: null pointer passed as argument 2, which is declared to never be null misc.cpp:101:29: runtime error: load of misaligned address 0x7ffdab82d529 for type 'word32', which requires 4 byte alignment misc.cpp:101:50: runtime error: load of misaligned address 0x00000127df95 for type 'word32', which requires 4 byte alignment misc.cpp:26:43: runtime error: load of misaligned address 0x7ffdab82c713 for type 'word64', which requires 8 byte alignment misc.cpp:35:43: runtime error: load of misaligned address 0x000003c65c4a for type 'word32', which requires 4 byte alignment misc.cpp:56:68: runtime error: store to misaligned address 0x000003c67653 for type 'word64', which requires 8 byte alignment misc.cpp:66:67: runtime error: store to misaligned address 0x000003c67651 for type 'word32', which requires 4 byte alignment misc.cpp:91:30: runtime error: load of misaligned address 0x7ffdab82d511 for type 'word64', which requires 8 byte alignment misc.h:1163:31: runtime error: load of misaligned address 0x00000127d8aa for type 'const unsigned int', which requires 4 byte alignment misc.h:1181:2: runtime error: load of misaligned address 0x000003c67669 for type 'const unsigned int', which requires 4 byte alignment misc.h:177:26: runtime error: null pointer passed as argument 2, which is declared to never be null misc.h:623:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int' misc.h:635:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int' misc.h:641:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int' misc.h:962:23: runtime error: load of misaligned address 0x7ffdab82c702 for type 'const long long unsigned int', which requires 8 byte alignment panama.cpp:371:11: runtime error: load of misaligned address 0x7ffdab82c716 for type 'const word32', which requires 4 byte alignment panama.cpp:371:18: runtime error: load of misaligned address 0x7ffdab82c71a for type 'const word32', which requires 4 byte alignment panama.cpp:371:25: runtime error: load of misaligned address 0x7ffdab82c71e for type 'const word32', which requires 4 byte alignment ... etc.
See PR65709 for a similar issue.
So I suppose the IsAligned template is wrong.
(In reply to Richard Biener from comment #6) > So I suppose the IsAligned template is wrong. Yes. 390 template <class T> 391 inline unsigned int GetAlignmentOf(T *dummy=NULL) // VC60 workaround 392 { 393 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS 394 if (sizeof(T) < 16) 395 return 1; 396 #endif 397 398 #if (_MSC_VER >= 1300) 399 return __alignof(T); 400 #elif defined(__GNUC__) 401 return __alignof__(T); 402 #elif CRYPTOPP_BOOL_SLOW_WORD64 403 return UnsignedMin(4U, sizeof(T)); 404 #else 405 return sizeof(T); 406 #endif 407 } 408 409 inline bool IsAlignedOn(const void *p, unsigned int alignment) 410 { 411 return alignment==1 || (IsPowerOf2(alignment) ? ModPowerOf2((size_t)p, alignment) == 0 : (size_t)p % alignment == 0); 412 } 413 414 template <class T> 415 inline bool IsAligned(const void *p, T *dummy=NULL) // VC60 workaround 416 { 417 return IsAlignedOn(p, GetAlignmentOf<T>()); 418 } and 345 #if CRYPTOPP_BOOL_X64 || CRYPTOPP_BOOL_X86 || defined(__powerpc__) 346 #define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS 347 #endif The only architecture where the assumption is valid is POWER8, where unaligned vector loads are preferable to ones with forced-alignment.
The code in algparam.h is just disgusting. AssignFromHelperClass binds a reference to NULL just to default-construct a temporary of some type, then binds a const-reference to that temporary, then casts away const to modify the value of that temporary. What seems to be a "VC60 workaround" introduces undefined behaviour by trying to be far too clever. Apparently also too clever to use compiler warnings.
(In reply to Jonathan Wakely from comment #8) > The code in algparam.h is just disgusting. AssignFromHelperClass binds a > reference to NULL just to default-construct a temporary of some type, then > binds a const-reference to that temporary, then casts away const to modify > the value of that temporary. What seems to be a "VC60 workaround" introduces > undefined behaviour by trying to be far too clever. Apparently also too > clever to use compiler warnings. Lol... Yeah, some of the legacy code is awful. I'm not throwing stones because the library supported so many compilers and IDEs back then. I think its time to start cleaning up some of the cruft. I'm going to cite this comment when I propose some of the changes.
(In reply to Richard Biener from comment #6) > So I suppose the IsAligned template is wrong. So I'm clear (please forgive my ignorance)... According to http://www.felixcloutier.com/x86/MOVDQA.html, vmovdqa requires values double quad word alignment (16-byte). A word64 is aligned on 8-bytes, and does not meet the precondition to use the instruction. Naively, that seems like a problem to me. Will that create problems with GCC and vectorizations?
(In reply to Jonathan Wakely from comment #8) > The code in algparam.h is just disgusting. AssignFromHelperClass binds a > reference to NULL just to default-construct a temporary of some type, then > binds a const-reference to that temporary, then casts away const to modify > the value of that temporary. What seems to be a "VC60 workaround" introduces > undefined behaviour by trying to be far too clever. Apparently also too > clever to use compiler warnings. The project is finally using -Wall, and its not flagging any issues with that code. I know -Wdelete-non-virtual-dtor is a problem (and potential security problem), and I'm trying to figure how how to proceed. Cf.https://groups.google.com/d/msg/cryptopp-users/__m0euOdbEQ/tvINItctzjAJ, What additional warnings would you suggest?
(In reply to Jeffrey Walton from comment #10) > (In reply to Richard Biener from comment #6) > > So I suppose the IsAligned template is wrong. > > So I'm clear (please forgive my ignorance)... > > According to http://www.felixcloutier.com/x86/MOVDQA.html, vmovdqa requires > values double quad word alignment (16-byte). A word64 is aligned on 8-bytes, > and does not meet the precondition to use the instruction. Naively, that > seems like a problem to me. > > Will that create problems with GCC and vectorizations? See PR65709 Comment 9 and followups.
A quick update.... We did out best to take the advice of Jakub, Janathan, Markus and others: https://github.com/weidai11/cryptopp/commit/9bf0eed0f6ff6d0b0a2d277e5401d69dc8c0e394. We are paying for past transgressions, and cleanup is not as easy as we would like. Eventually, we want to enable CRYPTOPP_NO_UNALIGNED_DATA_ACCESS by default. It removes all related undefined behavior flagged by UBsan. The only thing stopping us is the politics of such a move.