This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: generating unaligned vector load instructions?

From: "Norbert Lange" <lange at chello dot at>
To: gcc-help at gcc dot gnu dot org
Date: Thu, 19 Sep 2013 09:25:04 +0200
Subject: Re: generating unaligned vector load instructions?
Authentication-results: sourceware.org; auth=none
References: <op dot w3mp8duusf9m9j at noppl-pc> <523A52E4 dot 2020108 at aol dot com>

Am 19.09.2013, 03:27 Uhr, schrieb Tim Prince <n8tm@aol.com>:

On 9/18/2013 7:01 PM, Norbert Lange wrote:
Hello Tim,
can you specify which versions, maybe post the commandline, or tryingto compile for 32bit (-m32 switch)?Also I dont understand the comment about splitting - to avoidmisunderstanding - the generated code segfaults on my AthlonX2 so itsnot a question about optimal code, but actually working one
Im unable to generate the right instruction, and I dont exactly knowwhy it should differ between versions (... except bugs of course...).And I just want to know the right way to force unaligned loads, withoutinline assembly.
Btw: The code doesnt compile on gcc < 4.7 as I just realised - cantmultipy vector with scalars on older versions.
I wasn't even certain which of my gcc installations had 32-bitcounterparts, but Red Hat 4.4.6 appeared to accept your code for -m64but reject it for -m32. Intel icc, which shares a lot of stuff with theactive gcc, rejected your code. Many people here advocate options suchas -pedantic -Wall to increase the number of warnings, so you will getthose warnings even where gcc accepts your code.I thought X2 could accept nearly all normal sse2 code (original Turiondidn't) but I guess you are wanting to test its limits. Now that you'verevealed your actual target, someone might suggest a more appropriatearch option. Did you read about the errata for this instruction on yourchip? http://support.amd.com/us/Processor_TechDocs/25759.pdfSplitting unaligned 128-bit moves into separate 64-bit moves was acommon tactic likely to improve performance on CPUs prior to AMDBarcelona and Intel Nehalem (not to mention avoid bugs in hardwareimplementation). It probably didn't hurt to split the instructionexplicitly on a CPU where the hardware would split it anyway (I thoughtthis might be true of X2). Even with Intel Westmere there weresituations where splitting might improve performance. So gcc can't befaulted if it makes that translation, when you didn't tell it to compilefor a more recent CPU, or you specify a target which is known to haveproblems with certain instructions.


Thanks for your time and help, but I believe you miss the main point.

the code in question generates an aligned load instruction "movdqa" whichwill cause an alignment fault on ALL cpus (unless the data appears at a16-byte boundary, but thats based on luck since its alignment is 4)."movdqu" is the one that should be generated, and it works fine if I useinline-assembly for the load - but thats precisely what I dont want. Itsimply produces wrong code (and consistently, no matter what I put intomarch), nothing about tuning.The idea was to use the vector extension and let gcc output the optimalscalar or vector code.Well, I added a new version with a main routine, so this should allowrunning the code. with -msse2 the binary does segfault with the unalignedpointer, no matter what I do.


some other funny bits:

*compiling for arm correctly generates unaligned byte-loads with this code(doesnt has vector isa for ints), so it might be the x86 backend thatloses the unaligned property somewhere*memcpy seems to be able to generate the "movdqu" instruction, but itsvery fragile... using a pointer to the packed struct generates thesingular "movdqu" instruction while correctly using a pointer to themember generates a scalar inline-memcopy

Attachment: testvecs.c
Description: Binary data

References:
- Re: generating unaligned vector load instructions?
  - From: Norbert Lange
- Re: generating unaligned vector load instructions?
  - From: Tim Prince

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]