Poor man's JIT compiler

Robert Bernecky bernecky@snakeisland.com
Wed Sep 2 22:32:00 GMT 2009


Hi, Dean.

My initial attempt at compiler options was just -O0.

That resulted in the jmp insertion problem, so I conjectured
that there might be some alignment requirements/desires that
would result in jmp instructions being added to make each
labeled fragment start on an "appropriate" boundary.
Clearly, the no-align options did not help.

So, I just tried out your suggestion:

#define OP(nm, cod)  \
FS##nm: cod          \
    asm("nop" : : );  \
FE##nm:

This has the effect of inserting a NOP at the end of each code fragment.
And, it DOES appear to work (although I just quickly eyeballed
the asm code, so I might be missing something). I'll give
it more a careful workover tomorrow. (That was WITH the current
-noalign options still active.)

Now, what was it that led you to propose that inserting a NOP
would have the desired effect?

Many thanks for your reply!
Robert

Dean Anderson wrote:
> I suspect it does this because of instruction alignment and pipelining
> issues.   Why are you trying to turn off alignment?
> 
> You might try adding a nop after each one. 
> 
> 		--Dean
> 
> On Tue, 1 Sep 2009, Robert Bernecky wrote:
> 
>> I'm trying to get gcc version 4.3.2 to emit X86-64 code
>> fragments that I can catenate to perform my own JIT
>> compilation, but the compiler is being recalcitrant.
>>
>> (I was using a jump table, but its performance was underwhelming.)
>>
>> Roughly, what I've done is to create a set of code fragments,
>> with labels so that I can determine their address ( via &&label)
>> and length. E.g.,
>>
>> topLoad1:  reg1 = x[i];
>> botLoad1:
>>
>> topLoad2:  reg2 = y[i];
>> botLoad2:
>>
>> topAdd:    regz = reg1 + reg2;
>> BotAdd:
>>
>> topStore:  z[i] = regz;
>> botStore:
>>
>> Then, I have a table of fragment addresses (topLoad1, topLoad2, etc.)
>> and lengths (botLoad1-topLoad1, botLoad2-topLoad2), and a
>> (unknown statically) list of fragments to be assembled to build
>> working code, e.g.:
>>
>>   (Load2, Load1, Add, Store, Loop)
>>
>> I assemble the fragments into a code buffer and jump to it,
>> or so the story goes. Unfortunately, what I'm seeing in the
>> generated code fragments is not fun:
>>
>> 1. GCC sometimes, but NOT always, inserts jumps to the next
>>     fragment. E.g.:
>>
>> ----------------------------------------------
>>
>> .L46:
>>          .loc 2 34 0
>>          movq    -264(%rbp), %rax
>>          movq    %rax, -40(%rbp)
>> .L47:
>> .L7:
>>          .loc 2 40 0
>>          movl    %r8d, %eax
>>          jmp     .L48
>> .L6:
>> .L48:
>>          .loc 2 43 0
>>          movl    %r11d, %ecx
>> .L49:
>> .L50:
>> ----------------------------------------------
>>
>> Note the jmp .L48. If GCC always inserted a jump, I could
>> remove it, or if it never inserted the jump, I'd be even
>> happier, but it only does it now and then. I tried adding
>> my own jumps to force this:
>>
>> topLoad2:  reg2 = y[i];
>>             goto botLoad2;
>> botLoad2:
>>
>> but GCC removed them. And inserted others.
>>
>> Today, I'm using these compiler options:
>>
>> gcc  -O0 -ggdb -mtune=opteron -fno-align-labels -fno-align-jumps
>>
>> So, I welcome suggestions on how to solve or work around these
>> problems. Or even a completely different approach.
>>
>> Thanks,
>> Robert
>>
>>
>>
>>
> 



More information about the Gcc-help mailing list