This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: vec_ld versus vec_vsx_ld on power8
- From: Ewart TimothÃe <timothee dot ewart at epfl dot ch>
- To: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Fri, 13 Mar 2015 17:11:53 +0000
- Subject: Re: vec_ld versus vec_vsx_ld on power8
- Authentication-results: sourceware.org; auth=none
- References: <1426259770 dot 3168 dot 28 dot camel at gnopaine> <DEE28D58-7F5A-43F6-9876-900FD7C54FB2 at epfl dot ch> <1426265401 dot 3168 dot 37 dot camel at gnopaine>
Hello,
I am super confuse now
scenario 1, what I have in m code:
machine boots in LE.
1) memory: LE
2) I load (ld_vec)
3) register : LE
4) VSU compute in LE
5) I store (st_vec)
6) memory: LE
scenario 2: ( I did not test but it is what I get if I order gcc to compiler in BE)
machine boot in BE
1) memory: BE
2) I load (ld_vsx_vec)
3) register : BE
4) VSU compute in BE
5) I store (st_vsx_vec)
6) memory: BE
At this point the VUS compute in both order
chimera scenario 3, what I understand:
machine boot in LE
1) memory: LE
2) I load (ld_vsx_vec) (the load swap the element)
3) register : BE
4) swap : LE
5) VSU compute in LE
6) swap : BE
5) I store (st_vsx_vec) (the store swap the element)
6) memory: BE
I understand ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute
in both mode what should I swap (I precise I am working with 32/64 bits float)
Best,
Tim
TimothÃe Ewart, Ph. D.
http://www.linkedin.com/in/tewart
timothee.ewart@epfl.ch
> Le 13 Mar 2015 Ã 17:50, Bill Schmidt <wschmidt@linux.vnet.ibm.com> a Ãcrit :
>
> Hi Tim,
>
> Actually, I left out another very good reason why you may want to use
> vec_vsx_ld/st. Sorry for forgetting this.
>
> As you saw, vec_ld translates into the lvx instruction. This
> instruction loads a sequence of 16 bytes into a vector register. For
> big endian, the first byte in memory is loaded into the high order byte
> of the register. For little endian, the first byte in memory is loaded
> into the low order byte of the register.
>
> This is fine if the data you are loading is arrays of characters, but is
> not so fine if you are loading arrays of larger items. Suppose you are
> loading four integers {1, 2, 3, 4} into a register with lvx. In big
> endian you will see:
>
> 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04
>
> In little endian you will see:
>
> 04 00 00 00 03 00 00 00 02 00 00 00 01 00 00 00
>
> But for this to be interpreted as a vector of integers ordered for
> little endian, what you really want is:
>
> 00 00 00 04 00 00 00 03 00 00 00 02 00 00 00 01
>
> If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
> followed by an xxpermdi that swaps the doublewords. After the lxvw2x
> you will have:
>
> 00 00 00 02 00 00 00 01 00 00 00 04 00 00 00 03
>
> because the two LE doublewords are loaded in BE (reversed) order.
> Swapping the two doublewords restores sanity:
>
> 00 00 00 04 00 00 00 03 00 00 00 02 00 00 00 01
>
> So, even if your data is properly aligned, the use of vec_ld = lvx is
> only correct if you are loading arrays of bytes. Arrays of anything
> larger must use vec_vsx_ld to avoid errors.
>
> Again, sorry for my previous omission!
>
> Thanks,
>
> Bill Schmidt, Ph.D.
> IBM Linux Technology Center
>
> On Fri, 2015-03-13 at 15:42 +0000, Ewart TimothÃe wrote:
>> thank you very much for this answer.
>> I know my memory is aligned so I will use vec_ld/st only.
>>
>> best
>>
>> Tim
>>
>>
>>
>>
>>
>
>