This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.2 altivec options and glibc


Am Die, 2002-08-27 um 11.24 schrieb Gabriel Paubert:

> I know it's offtopic, but you are ignoring Altivec's merge instruction,

merge? You probably mean permute....

> which allows to write a compact memcpy loop (9 instruction loop to copy 32
> bytes: 2 loads, 2 stores, 2 merges, 2 address bumps and one decrement and
> branch), taking care of the alignment by shuffling bytes around in
> registers.

Generic code with constant vector permutation is always relatively slow
compared to unencumbered code because for a single read one needs two
16-byte reads and for an unaligned write 2 reads + 2 writes + several
helper mnemonics. Memory access on PowerPC is traditionally quite slow
because of the way the front-bus works.

> Of course it's only worth for fairly large copies, especially
> since the head and tail of the copy are likely to have a non negligible
> icache footprint.

Exactly my point.

> With a suitable shuffle register parameter, Altivec's
> merge instruction can be used for many other things, like endian
> conversion, etc..., but that's beyond the point.
 
> Besides that, all vector instruction sets can at least be used for memset.

You can use it for anything but performance will likely suck for a large
amount of unaligned memory accesses which is what you'll most likely get
with standard functions. 

even:

void clear_blocks_altivec (DCTELEM *blocks)
{
  vector signed short temp;
  vector signed short zero = vec_xor (temp, temp);
  unsigned int offset;
  
  for (offset = 0; offset < sizeof(DCTELEM) * 6 * 64; offset += 16)
    vec_st (czero, offset, blocks); 
}

with unrolling and several other tricks is no real burner though its
guaranteed alignment and for larger mem* you'll probably want to use
cacheblock instructions anyways because they are resulting in maximum
throughput.

-- 
Servus,
       Daniel

Attachment: signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]