This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: vec_ld versus vec_vsx_ld on power8

From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
To: Ewart Timothée <timothee dot ewart at epfl dot ch>
Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Fri, 13 Mar 2015 12:27:25 -0500
Subject: Re: vec_ld versus vec_vsx_ld on power8
Authentication-results: sourceware.org; auth=none
References: <1426259770 dot 3168 dot 28 dot camel at gnopaine> <DEE28D58-7F5A-43F6-9876-900FD7C54FB2 at epfl dot ch> <1426265401 dot 3168 dot 37 dot camel at gnopaine> <CC6720D1-1ED7-4323-A95D-F70C8A0F4871 at epfl dot ch>

Hi Tim,

Sorry to have confused you.  This stuff is a bit boggling the first 200
times you look at it...

For both 32-bit and 64-bit floating-point, you should use ld_vsx_vec on
both BE and LE machines, and the compiler will take care of doing the
right thing for you in both cases.  You do not have to add any swaps
yourself.

When compiling for big-endian, ld_vsx_vec will translate into either
lxvw4x (for 32-bit floating-point) or lxvd2x (for 64-bit
floating-point).  The values will be loaded into the register from
left-to-right (BE ordering).

When compiling for little-endian, ld_vsx_vec will translate into lxvd2x
followed by xxpermdi for both 32-bit and 64-bit floating-point.  This
does the right thing in both cases.  The values will be loaded into the
register from right-to-left (LE ordering).

The vector programming model is set up to allow you to usually code the
same way for both BE and LE.  This is discussed more in Chapter 6 of the
ELFv2 ABI manual, which can be obtained from the OpenPOWER Connect
website (free registration required):

https://www-03.ibm.com/technologyconnect/tgcm/TGCMServlet.wss?alias=OpenPOWER&linkid=1n0000

Bill


On Fri, 2015-03-13 at 17:11 +0000, Ewart TimothÃe wrote:
> Hello,
> 
> I am super confuse now
> 
> scenario 1, what I have in m code:
> machine boots in LE.
> 
> 1) memory: LE
> 2) I load (ld_vec)
> 3) register : LE
> 4) VSU compute in LE
> 5) I store (st_vec)
> 6) memory: LE
> 
> scenario 2: ( I did not test but it is what I get if I order gcc to compiler in BE)
> machine boot in BE
> 
> 1) memory: BE
> 2) I load (ld_vsx_vec)
> 3) register : BE
> 4) VSU compute in BE 
> 5) I store (st_vsx_vec)
> 6) memory: BE
> 
> At this point the VUS compute in both order
> 
> chimera scenario 3, what I understand:
> 
> machine boot in LE
> 
> 1) memory: LE
> 2) I load (ld_vsx_vec)  (the load swap the element)
> 3) register : BE
> 4) swap : LE
> 5) VSU compute in LE
> 6) swap : BE 
> 5) I store (st_vsx_vec) (the store swap the element)
> 6) memory: BE
> 
> I understand  ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute
> in both mode what should I swap (I precise I am working with 32/64 bits float)
> 
> Best,
> 
> Tim
> 
> TimothÃe Ewart, Ph. D. 
> http://www.linkedin.com/in/tewart
> timothee.ewart@epfl.ch
> 
> 
> 
> 
> 
> 
> > Le 13 Mar 2015 Ã 17:50, Bill Schmidt <wschmidt@linux.vnet.ibm.com> a Ãcrit :
> > 
> > Hi Tim,
> > 
> > Actually, I left out another very good reason why you may want to use
> > vec_vsx_ld/st.  Sorry for forgetting this.
> > 
> > As you saw, vec_ld translates into the lvx instruction.  This
> > instruction loads a sequence of 16 bytes into a vector register.  For
> > big endian, the first byte in memory is loaded into the high order byte
> > of the register.  For little endian, the first byte in memory is loaded
> > into the low order byte of the register.
> > 
> > This is fine if the data you are loading is arrays of characters, but is
> > not so fine if you are loading arrays of larger items.  Suppose you are
> > loading four integers {1, 2, 3, 4} into a register with lvx.  In big
> > endian you will see:
> > 
> >  00 00 00 01  00 00 00 02  00 00 00 03  00 00 00 04
> > 
> > In little endian you will see:
> > 
> >  04 00 00 00  03 00 00 00  02 00 00 00  01 00 00 00
> > 
> > But for this to be interpreted as a vector of integers ordered for
> > little endian, what you really want is:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
> > followed by an xxpermdi that swaps the doublewords.  After the lxvw2x
> > you will have:
> > 
> >  00 00 00 02  00 00 00 01  00 00 00 04  00 00 00 03
> > 
> > because the two LE doublewords are loaded in BE (reversed) order.
> > Swapping the two doublewords restores sanity:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > So, even if your data is properly aligned, the use of vec_ld = lvx is
> > only correct if you are loading arrays of bytes.  Arrays of anything
> > larger must use vec_vsx_ld to avoid errors.
> > 
> > Again, sorry for my previous omission!
> > 
> > Thanks,
> > 
> > Bill Schmidt, Ph.D.
> > IBM Linux Technology Center
> > 
> > On Fri, 2015-03-13 at 15:42 +0000, Ewart TimothÃe wrote:
> >> thank you very much for this answer.
> >> I know my memory is aligned so I will use vec_ld/st only.
> >> 
> >> best
> >> 
> >> Tim
> >> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
>

References:
- vec_ld versus vec_vsx_ld on power8
  - From: Bill Schmidt
- Re: vec_ld versus vec_vsx_ld on power8
  - From: Ewart Timothée
- Re: vec_ld versus vec_vsx_ld on power8
  - From: Bill Schmidt
- Re: vec_ld versus vec_vsx_ld on power8
  - From: Ewart TimothÃe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]