Poor man's JIT compiler

Robert Bernecky bernecky@snakeisland.com
Mon Sep 7 19:46:00 GMT 2009


Thanks for the hand, Dean.

I've run out of time to work on this until next month,
but want you to know that I appreciate you (and others) taking your
time to reply to me on this topic.

-fPIC seems to help a fair bit on the embedded jmp problem.

In summary, where I think I'm at with threaded/JIT compilation with
label pointers now is:

1. Using label pointers with jumps works, but is
    quite slow, particularly when the code fragment
    sizes are small (which they are in my case).

2. Making the compiler not generate jmp statements in
    such code fragments is tricky, at best, and is going
    to be a fragile area, in the sense of being very sensitive
    to compiler changes, code fragment contents, etc.

Thanks again,
Bob

Dean Anderson wrote:
> That's great. Its not quite fixed yet.
> 
> You can't turn off alignment where it would result in unexecutable code.  
> If I recall my x86 assembly, labels have to be word aligned, while the
> next instruction doesn't always have to be.  Depending on the distance
> to the alignment required, an unconditional jmp might be a short cut.  
> Adding the nop just removed the oportunity for an unconditional jmp.
> 
> You'll have to find what happens for every case of misalignment. You
> might still need to take care to only insert the nop when necessary. 
> That is, if your last instruction just happens to come perfectely 
> aligned, the nop might cause more another unconditional jmp to be 
> inserted.
> 
> Still, it occurs to me that the JMP, if it were position independent,
> and the next buffer is always at the right alignment (the target of the
> JMP), should cause no trouble.  Try adding -PIC to your compiler
> options.
> 
> 		--Dean
> 
> 
> 
> On Wed, 2 Sep 2009, Robert Bernecky wrote:
> 
>> Hi, Dean.
>>
>> My initial attempt at compiler options was just -O0.
>>
>> That resulted in the jmp insertion problem, so I conjectured
>> that there might be some alignment requirements/desires that
>> would result in jmp instructions being added to make each
>> labeled fragment start on an "appropriate" boundary.
>> Clearly, the no-align options did not help.
>>
>> So, I just tried out your suggestion:
>>
>> #define OP(nm, cod)  \
>> FS##nm: cod          \
>>     asm("nop" : : );  \
>> FE##nm:
>>
>> This has the effect of inserting a NOP at the end of each code fragment.
>> And, it DOES appear to work (although I just quickly eyeballed
>> the asm code, so I might be missing something). I'll give
>> it more a careful workover tomorrow. (That was WITH the current
>> -noalign options still active.)
>>
>> Now, what was it that led you to propose that inserting a NOP
>> would have the desired effect?
>>
>> Many thanks for your reply!
>> Robert
>>
>> Dean Anderson wrote:
>>> I suspect it does this because of instruction alignment and pipelining
>>> issues.   Why are you trying to turn off alignment?
>>>
>>> You might try adding a nop after each one. 
>>>
>>> 		--Dean
>>>
>>> On Tue, 1 Sep 2009, Robert Bernecky wrote:
>>>
>>>> I'm trying to get gcc version 4.3.2 to emit X86-64 code
>>>> fragments that I can catenate to perform my own JIT
>>>> compilation, but the compiler is being recalcitrant.
>>>>
>>>> (I was using a jump table, but its performance was underwhelming.)
>>>>
>>>> Roughly, what I've done is to create a set of code fragments,
>>>> with labels so that I can determine their address ( via &&label)
>>>> and length. E.g.,
>>>>
>>>> topLoad1:  reg1 = x[i];
>>>> botLoad1:
>>>>
>>>> topLoad2:  reg2 = y[i];
>>>> botLoad2:
>>>>
>>>> topAdd:    regz = reg1 + reg2;
>>>> BotAdd:
>>>>
>>>> topStore:  z[i] = regz;
>>>> botStore:
>>>>
>>>> Then, I have a table of fragment addresses (topLoad1, topLoad2, etc.)
>>>> and lengths (botLoad1-topLoad1, botLoad2-topLoad2), and a
>>>> (unknown statically) list of fragments to be assembled to build
>>>> working code, e.g.:
>>>>
>>>>   (Load2, Load1, Add, Store, Loop)
>>>>
>>>> I assemble the fragments into a code buffer and jump to it,
>>>> or so the story goes. Unfortunately, what I'm seeing in the
>>>> generated code fragments is not fun:
>>>>
>>>> 1. GCC sometimes, but NOT always, inserts jumps to the next
>>>>     fragment. E.g.:
>>>>
>>>> ----------------------------------------------
>>>>
>>>> .L46:
>>>>          .loc 2 34 0
>>>>          movq    -264(%rbp), %rax
>>>>          movq    %rax, -40(%rbp)
>>>> .L47:
>>>> .L7:
>>>>          .loc 2 40 0
>>>>          movl    %r8d, %eax
>>>>          jmp     .L48
>>>> .L6:
>>>> .L48:
>>>>          .loc 2 43 0
>>>>          movl    %r11d, %ecx
>>>> .L49:
>>>> .L50:
>>>> ----------------------------------------------
>>>>
>>>> Note the jmp .L48. If GCC always inserted a jump, I could
>>>> remove it, or if it never inserted the jump, I'd be even
>>>> happier, but it only does it now and then. I tried adding
>>>> my own jumps to force this:
>>>>
>>>> topLoad2:  reg2 = y[i];
>>>>             goto botLoad2;
>>>> botLoad2:
>>>>
>>>> but GCC removed them. And inserted others.
>>>>
>>>> Today, I'm using these compiler options:
>>>>
>>>> gcc  -O0 -ggdb -mtune=opteron -fno-align-labels -fno-align-jumps
>>>>
>>>> So, I welcome suggestions on how to solve or work around these
>>>> problems. Or even a completely different approach.
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>>
>>>>
>>>>
>>
>>
> 



More information about the Gcc-help mailing list