This is the mail archive of the
mailing list for the GCC project.
Re: Dedicated logical instructions
- From: "Radu Hobincu" <radu dot hobincu at arh dot pub dot ro>
- To: "Ian Lance Taylor" <iant at google dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 10 Nov 2010 18:45:39 +0200 (EET)
- Subject: Re: Dedicated logical instructions
> "Radu Hobincu" <email@example.com> writes:
>> However, now I have another problem. I have 2 instructions in the ISA:
>> 'where' and 'endwhere' which modify the behavior of the instructions put
>> in between them. I made a macro with inline assembly for each of them.
>> problem is that since `endwhere` doesn't have any operands and doesn't
>> clobber any registers, the GCC optimization reorders it and places the
>> `endwhere` immediately after `where` leaving all the instructions
>> the block.
> That's tricky in general. You want an absolute barrier, but gcc doesn't
> really provide one that can be used in inline asm. The closest you can
> come is by adding a clobber of "memory":
> asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory");
> That will block all instructions that load or store from memory from
> moving across the barrier. However, it does not currently block
> register changes from moving across the barrier. I don't know whether
> that matters to you.
It does matter unfortunately. I've tried with memory clobber with the same
result (the addition in the example doesn't do any memory loads/stores).
> You didn't really describe what these instructions do, but they sound
> like looping instructions which ideally gcc would generate itself. They
> have some similarity to the existing doloop pattern, q.v. If you can
> get gcc to generate the instructions itself, then it seems to me that
> you will get better code in general and you won't have to worry about
> this issue.
I have 16 vectorial registers in the machine R16-R31 which all have 128
cells of 16 bits each. These support ALU operations and load/stores just
as normal registers, but in one clock. So an
add R16 R17 R18
will add the whole R17 array with R18 (corresponding cells) and place the
result in R16. The 'where' instruction places a mask on the array so the
operation is done only where a certain condition is met. In the example in
the previous e-mail, where `a` is less than `b`. I've read the description
of doloop and I don't think I can use it in this case. I'll have to dig
more or settle with -O0 and cry.
Thank you, anyway!