How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC?

Stephan Walter stephan.walter@ziti.uni-heidelberg.de
Tue Jan 21 20:03:00 GMT 2014


Am 21.01.2014 20:53, schrieb Brian Budge:
> On Tue, Jan 21, 2014 at 2:23 AM, Stephan Walter
> <stephan.walter@ziti.uni-heidelberg.de> wrote:
>> Hi,
>>
>> i am new to the gcc mailinglist, so i hope i am right here.
>>
>> As the subject shows, i work with KNC. My problem is, that i have developed
>> a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
>> some memcopy jobs.
>>
>> I have experience how to use the GCC to compile der KNC-linux and kernel
>> moduls. So no problem at the moment. Everything works fine.
>>
>> Before i started to write inline assembler with the 512Bit registers, i have
>> written some minimal examples.
>>
>> On a normal i5-3470 everything works fine together with the gcc. Also on KNC
>> everything works. The problem now is, that when i try to use the 512Bit
>> registers, it looks like GCC doesn't know the register names and
>> instructions.
>>
>> To solve the problem with the instructions i think is no problem, because i
>> have the instruction manual, but i have no idea how to solve the register
>> problem.
>>
>> So i try to use the ICC with -mmic. The source compiles, but when i measure
>> the clock cycles with rdtsc, the two first check work, but the 3. and 4.
>> not.
>> I tried to solve the problem with the gdb, but when i use -g the mistake no
>> longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
>> is fixed. So i think there is a race condition with the write of the value
>> into the memory, because 1 or even 100 nops have no effects.
>>
>> My inline assembler knowledge is rudimental, so i don't know if i have some
>> problems with the use of clobber registers and so on or if there is a bug in
>> gcc or icc.
>>
>> That the -g with the icc solve the problem makes it impossible for me to
>> debug the problem. So i hope somebody is able to help me.
>>
>> My favourite is to use gcc together with the 512Bit registers, if there is a
>> bug in my inline assembler, a solution/hint would be also fine.
>>
>> So there is my code:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <inttypes.h>
>>
>>
>> int rdtsc_count(void){
>> int count;
>> __asm__ __volatile__(   "rdtsc;                 \n\t"
>>                          "movl   %%eax, %0;      \n\t"
>>                           :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
>>                           :
>>                           :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
>>                          );
>>
>> return count;
>> }
>>
>>
>> int main(int argc, char *argv[]){
>>
>> int starta=0, startb=0, stopa=0, stopb=0;
>> int buffer_size=32;
>> uint64_t* buffer;
>> uint32_t buflen=atoi(argv[1]);
>>
>>
>> /////////////setup
>> buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
>> packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
>> packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF
>>
>> waddr=0;
>>
>> //printf("Adresse von packet_buffer %x", waddr);
>> printf("Orginaldaten\n");
>> for(i=0; i<buffer_size; i++){
>>          buffer[i]=i+i*i;
>>          packet_buffer[i]=0;
>>          packet_buffer_ref[i]=0;
>>          printf("%x\t", buffer[i]);
>> };
>> printf("\n");
>>
>> printf("packet_buffer start\n");
>> for(i=0; i<buffer_size; i++){
>>          printf("%x\t", packet_buffer[i]);
>> };
>> printf("\n");
>>
>> ////////////end_setup
>>
>> if(buflen==0 | buflen>120){
>>          printf("buflen too big or too small\n");
>>          return 0;
>> }
>>
>>
>> ########################################
>> starta=rdtsc_count();
>> memcpy(&(packet_buffer_ref[waddr+1]), buffer,
>> sizeof(uint64_t)*(buflen));//REF
>> stopa=rdtsc_count();
>> printf("memcpy took\t%d\tclocks\n", stopa-starta);
>> ########################################
>> ##Here everything is fine
>> ########################################
>>
>> ########################################
>> startb=rdtsc_count();
>> __asm__ (             "movq   %1,             %%rsi;          \n\t"
>>                          "movq   %0,             %%rdi;          \n\t"
>>                          "movl   %2,             %%ecx;          \n\t"
>>                          "addq   $8,             %%rdi;          \n\t"
>> //                      "shl    $3,             %%ecx;          \n\t"
>>          "Schleife:       movsq;                                 \n\t"
>>                          "loop Schleife;                         \n\t"
>>                          :"=m"(packet_buffer)
>>                          :"r"(buffer), "r"(buflen)
>>                          :"%rsi", "%rdi", "%rcx", "memory"
>>                          );
>>
>> stopb=rdtsc_count();
>>
>> ######################################### If i use one of this functions,
>> everything is fine.
>> //usleep(1);
>> //printf("stopa %d\n", stopa);
>> //printf("fdsagfa\n");
>> #########################################
>> printf("asm movsq took\t%d\tclocks\n", stopb-startb);
>>
>> ########################################
>> ##Here i have the problem. It looks like stopb or startb is still 0, when i
>> use no function between the output and the rdtsc_count()
>> ########################################
>>
>>
>
> I'm unsure if this is what is causing your problem, but rdtsc can be
> executed out of order to other instructions, and so instructions
> issued prior to rdtsc need not be complete before the measurement is
> made.  I've seen that the cpuid instruction  forces this to be the
> case.  I believe also that using rdtscp will prevent the reordering on
> its own.
>
>    Brian
>
KNC is a inorder CPU-Design, so there should be no instruction 
reodering. The only possibility is the superscalarity, but then i don't 
know how to be save, that a instruction have been already done.



More information about the Gcc-help mailing list