Lhogho  0.0.027
Code generation

When Lhogho compiles Logo source it uses code generation patterns. I.e. it recognizes specific patterns in the user program and generates code according to some predefined templates.

Templates are extremely processor-specific. Because currently Lhogho compiles for i386 processors (like Pentium family, for example) the code presented in this page will use i386 assembly mnemonics.

Registers

Lhogho tries to be consistent in respect to how processor registers are used. The list below describes the usage of some of the registers:

Stack structure

The stack is used to keep a variety of data - function parameters, static pointer, dynamic pointer, local variables, and information about repeat commands.

The stack structure inside user-defined functions is:

Offset Type Contents Macro (vars.h)
EBP+16 atom_t 1st parameter passed from caller BASE_OFFSET_PARAMS+4
EBP+12 atom_t 2nd parameter passed from caller BASE_OFFSET_PARAMS
EBP+8 int number of parameters
EBP+4 int Returned address (generated by the CALL instruction) none
EBP int Dynamic frame pointer (old EBP). Used to access the stack frame of the calling procedure. none
EBP-4 int Static frame pointer (old ESI). Used to access the stack frame of the parent procedure. BASE_OFFSET_STATIC
EBP-8 atom_t The var atom of the procedure. BASE_OFFSET_PARENT
EBP-12 atom_t Repeat chain BASE_OFFSET_REPEATCHAIN
EBP-16 int Value for TEST, IFTRUE, IFFALSE BASE_OFFSET_TEST
EBP-20 atom list of local variables created at runtime BASE_OFFSET_LOCALS
EBP-24 atom 1st local variable BASE_OFFSET_LOCALS-4
EBP-28 atom 2nd local variable BASE_OFFSET_LOCALS-8

Code for TO..END

Lhogho generates code only for the user-defined functions/commands/operators (i.e. those which are defined in Logo source code). Primitives are defined in C and are precompiled into the compiler. The main Logo program is also considered as a user-defined function with one exception -- the repeat chain which is pushed in the stack is not the unbound atom, but an atom which has REPCOUNT = -1.

Logo:
to name...
   local "local1
   local "local2
   :

   {command1}
   {command2}
   {command3}
   :
end
Assembler:
name:
  push   ebp             ; prologue
  mov    ebp,esp
  push   esi

  push   "(unbound)"     ; repeat chain
  call   use
  push   2               ; initial value for TEST
  mov    eax,(unbound)   ; default result

  push   "(unbound)"     ; initialization of
  call   use             ; local variables
  push   "(unbound)"
  call   use

 {command1}              ; function body
 {command2}
 {command3}

  mov    ebx,0

exit:
  call   deuse           ; finalization of
  call   deuse           ; local variables

  pop    edx             ; value for TEST
  call   deuse           ; repeat chain

  pop    esi             ; epilogue
  mov    esp,ebp
  pop    ebp
  ret

The generated code for each user-defined functions contains prologue, body and epilogue. The prologue initializes the function by setting the frame and the stack pointers, while the epilogue reverts them to their original values. Both the prologue and the epilogue do not correspond to any part of the Logo source.

The body is a sequence of calls to commands.

Note:
Functions, commands and operators (both primitive and user-defined) always return an atom:
  • error atom - the execution generated error
  • stopped atom - the execution stopped because of STOP command
  • unbound atom - the execution did not generate any result
  • any other atom - this is the result value produced by the execution of the compiled code. It is up to the caller to decide what to so with this value.

Exiting from the definition is done by a jump to the exit label.

Code for a command-call

The code generated for a command-call is composed of two fragments - an expression-call (i.e. the command is treated as expression); and result verification code.

Logo
command param1 param2
Assembler
 {expr}
  push   {source}
  call   rt_cmdchk
  cmp    [eax+OFFSET_ID],ERROR_ID
  je	  exit

{Expr} is a fragment of code where the Logo command-call is compiled as if it is an expression. The result of {expr} is stored in the stack and rt_cmdchk function checks whether it is legal for a command.

If the result of {expr} is:

If rt_cmdchk returned an error atom, then execution jumps to the exit point of the function.

rt_cmdchk accepts two inputs. The first one is the result of the command treated as expression, and the second one, i.e. {source}, is the source code corresponding to the command. This source code is used to set the error position if needed. It is also used for the runtime trace mode.

Code for a function-call

The code generated for a function-call (or an operator-call) is similar to the one generated for a command-call - only the verification code is different.

Logo:
  ... (functon param1 param2) ...
Assembler:
 {expr}
  push   {source}
  call   rt_funchk
  push   eax

{Expr} is a fragment of code where the Logo function-call is compiled as if it is an expression. The result of {expr} is stored in the stack and rt_funchk function checks whether it is legal for a function or an operator.

If the result of {expr} is:

The overall outcome of the code for functions (if there is no error) is the the final value is pushed in the stack.

If rt_funchk returned an error atom, then execution does not jump to the exit point of the function.

rt_funchk accepts two inputs. The first one is the result from the function, and the second one is the source code corresponding to the function-call. This source code is used to set the error position.

Code for an expression (function, command, operator)

Expression are massively generated. Each call to a function or a command is based on expressions. The code generated for an expression leaves the result in the stack. The code generated for an expression depends on whether the expression is a constant or it has functions or operators.

(a) constant

Expressions which are contants (i.e. numbers, word-constans and list-constants) are pushed directly into the stack. Their reference count is incremented with use.

Logo:
  const
Assembler:
  push   const
  call   use

(b) primitive

Expression which uses a primitive command, function or operator is compiled into three fragments: (1) all inputs are compiled as standalone expressions (i.e. constants or function-calls), (2) code for the actual call of the primitive, and (3) code to release all inputs.

Logo:
  primitive input1 input2
Assembler:
 {expr}
 {expr}
  call   {primitive}
  call   deuse
  call   deuse
  push   eax

The two {expr} fragments stand for the code generated for the two inputs. Their results appear pushed in the stack. {Primitive} is the address of the primitive function/command/operator. The result of this call is stored in EAX regiter. For each input there is a call to deuse() function which removes it from the stack. deuse() does not modify EAX register, thus the last instruction is to push its contents in the stack. This value is the result of the call of the primitive.

(c) variadic

Variadic call is a call which number of inputs could vary. Such calls are compiled in the same way as non-variadic ones, except that the number of inputs is explicitely pushed before the call and popped after it.

Logo:
  (function input ... input)
Assembler:
  :
  push   {n}
  call   {function}
  pop    edx
  :

The constant {n} is an integer number - the actual number of inputs.

(d) user-defined

Expression which uses a user-defined command, function or operator is compiled as the primitive expression (see (a)) expect that a static link is calculated and stored in ESI register before the call.

A user-defined function may be recompiled (thus changing its address) or could be still uncompiled (i.e. no address is defined for it) during the compilation of a call to this function. To resolve this problem in realtime the function is called through its address stored in function's descriptor.

Logo:
  userfunc input1 input2
Assembler:
 {expr}
 {expr}
 {static}
  call   [{userfunc}]
  call   deuse
  call   deuse
  push   eax

(e) system variables

Some functions require access to system defined variables. For example, print, show and type use the values of printdepthlimit, printwidthlimit and fullprintp. Similarily, equalp, notequalp, memberp and member use the calue of caseignoredp.

The values of such system variables are explicitely pushed just before the actual call. If the call is variadic, the system variables are pushed after the number of inputs.

Logo:
  (print input input...)
Assembler:
 {expr}
  :
 {expr}

  push   {n}

  push   {value of system var1}
  call   use
  :
  push   {value of system var1}
  call   use

  call   {print}

  call   deuse
  :
  call   deuse

  pop    edx

  call   deuse
  :
  call   deuse

Code for static link

Whenever Lhogho need to generate code to access a user-defined variable or prepare a static link for user-defined function, command or operator, it is needed to provide mechanism to find the appropriate context (i.e. stack frame). If the accessed object is in the current context, then it could be accessed directly through EBP register.

If it is in some upper level of lexical scope hierarchy, then the access is done by calculating the context in ESI register.

Code for :VAR

There are two types of variables - primitive (system) and user-defined. They are accessed in different ways, because primitive variables are global and their values are in fixed memory locations. User-defined variables have their values stored in the stack.

(a) system variables

System (i.e. primitive) variables are created during the initialization of the compiler. They are stored in the globals variable. The value of each system variable is at a memory address which is known at compile time. Thus the code for referencing a system variable is:

  push  [{value}]
  call  use

where {value} is the address whether the value of the variable is stored.

(b) user-defined variables

Code generated to access the value of a variable depends on whether the variable is in the current context (in this case EBP is used) or in an upper context (ESI is calculated and used). The offset of the variable within context is fixed and does not depend on from what context the variable is being accessed.

The code to access a variable pushes its value in the stack and also returns it in EAX register:

The code for {static} fragment is the code which calculates the static link to the context frame of the variable.

Code for MAKE

The MAKE command is a special case and it is not compiled as an ordinary command. This is done in order to maintain higher performance and to avoid some difficulties associated with assignments (like lvalues and rvalues).

Depending on the arguments MAKE can be compiled in one of these ways:

(a) direct

When the first input of MAKE is a word-constant, then Lhogho can assist in finding the location of the variable's value during compile time. This contributes to a higher performance in run-time and is called direct MAKE.

Then Lhogho finds a variable it can be in the same context level or in an upper one. If the context level is the same, the generated code is:

Logo:
  make "{name} {expr}
Assembler:
 {expr}
  push   [ebp-{offset}]
  call   deuse
  mov    eax,[esp]
  mov    [ebp-{offset}],eax
  push   {source}
  call   rt_makechk
  cmp    [eax+OFFSET_ID],ERROR_ID
  je	  exit

If the variable is defined in an upper context then code for a static link is generated and the variable is referenced through ESI register.

Logo:
  local "{name}

  to ...
     to ...
        to ...  
           make "{name} {expr}
        end
     end
  end
Assembler:
 {expr}
 {static}
  push   [esi-{offset}]
  call   deuse
  mov    eax,[esp]
  mov    [esi-{offset}],eax
  push   {source}
  call   rt_makechk
  cmp    [eax+OFFSET_ID],ERROR_ID
  je	  exit

{Expr} is the code generated for the second input of MAKE. The execution of {expr} should put in the stack the new value of the variable.

There is no code for the first input, because it is used explicitely. {Static} is the code which calculates the static link to the context frame of the variable. The value of {offset} is the offset of the variable related to the frame pointer (i.e. EBP or ESI). The offset is calculated at compile-time.

The verification of the result is done with rt_makechk which is similar to rt_cmdchk.

When a new value is assigned to a variable its old value is deused.

Code for OUTPUT

The output command is compiled in a special way by Lhogho, because its execution does not go in the normal way. The generated code for output moves the input of the command in EAX register and jumps directly to the exit label.

Logo:
output {value}
Assembler:
 {value}
  mov    ebx,1
  pop    eax
  jmp    exit

When Lhogho is compiled in ADVANCED mode and when the -Zrt compiler option is used, then the generated code includes instructions to dump the source code of output command.

Logo:
output {value}
Assembler:
 {value}
  push  {source}
  call   rt_dump
  mov    ebx,1
  pop    eax
  jmp    exit

Code for IF

There are two cases of if command - with two and with three inputs. They are compiled in similar, but yet different ways.

The beginning of the generated code is the same. After the code for the condition, there is a check for whether the result of the condition expression is true or false. The result of rt_boolchk is in EAX register, which could be either the true atom, the false atom or an error atom.

Logo:
if {cond} {then}
Assembler:
 {cond}
  call   rt_boolean
  cmp    [eax+OFFSET_ID],ERROR_ID
  je     exit
  cmp    eax,FALSE
  je     else
then:
 {then}
else:
  mov    eax,{unbound}

If the if command has three parameters, then the generated code is:

Logo:
if {cond} {then} {else}
Assembler:
 {cond}
  call   rt_boolean
  cmp    [eax+OFFSET_ID],ERROR_ID
  je     exit
  cmp    eax,FALSE
  je     else
then:
 {then}
  jmp    next
else:
 {else}
next:
  mov    eax,{unbound}

Code for REPEAT

The repeat command is implemented in a special way -- it uses the repeat chain to keep track of the number of left and passed repetitions. The repeat chain is a list located in the local stack - it is like a system-defined local variable. The elements of the list are integer atoms. The actual 64-bit number is split into two 32-bit numbers - one of them is left for repcount, the other is used to count down the number of repetitions left. These two values could be accessed with REPCOUNT and :REPLIMIT macros.

The first element of the repeat chain corresponds to the current repeat. The second element is for the immediate outer repeat. The last element is for the outer-most repeat.

When a new repeat is initiated by rt_repeat_enter, a new atom is added to the beginning of the repeat chain. When a repeat is completed by rt_repeat_exit then the first element of the chain list is removed.

There are two cases of repeat command - when the number of repetition is known at compile time, and when it is expression to be evaluated at run-time.

When the repetition count is known, the generated code is:

Logo:
repeat {const} {block}
Assembler:
  push  {const}
  push  ebp
  call  rt_repeat_enter
repeat:
 {block}
  mov    eax,[ebp+REPEATCHAIN]
  mov    eax,[eax+CAR_OFFSET]
  inc    [eax+REPCOUNT]
  dec    [esp+REPLIMIT]
  jnz    repeat
  push   ebp
  call   rt_repeat_exit

If the number of repetitions is 0, then no code for repeat command is generated.

When the repetition count is not known - i.e. it is an expression - then the generated code is:

Logo:
repeat {expr} {block}
Assembler:
 {expr}
  call   rt_repchk
  cmp    [eax+ID],ERROR_ID
  je     exit
  push   eax
  mov    eax,[eax+REPLIMIT]
  call   deuse
  cmp    eax,0
  je     skip
  push   eax
  push   ebp
  call   rt_repeat_enter
repeat:
 {block}
  mov    eax,[ebp+REPEATCHAIN]
  mov    eax,[eax+CAR_OFFSET]
  inc    [eax+REPCOUNT]
  dec    [esp+REPLIMIT]
  jnz    repeat
skip:
  push   ebp
  call   rt_repeat_exit

Code for FOREVER

The forever command is implemented similarily to repeat -- it uses the repeat chain to keep track of the number of passed repetitions. The generated code is:

Logo:
forever {block}
Assembler:
  push  ebp
  call  rt_forever_enter
forever:
 {block}
  mov    eax,[ebp+REPEATCHAIN]
  mov    eax,[eax+CAR_OFFSET]
  inc    [eax+REPCOUNT]
  jmp    forever

Code for FOR

The for command is implemented in a way similar to repeat. However, for is a variable generating command - it creates a local variable with name given as the first input of for. This name must be a word literal.

The second input is a list constant with two or three expressions. The code of these expressions is placed at the beginning of the compiled code for for.

First the code of the first expressions in {limits} is pushed. It pushes in the stack the initial value of the control variable. This value is then dupicated and assigned "manually" to the control variable in a way similar to make but without result checking.

The the code for the second expression in {limits} is generated, followed by the code of the third expression. If the third expression is missing, then generate code that pushes the :UNBOUND value.

The prologue code for for is the same as the one for repeat. The only difference is that the check function rt_forchk not only checks the limits but also calculates and sets the step if it is not provided.

The epilogue code for for is the same as the one for repeat. This it uses the same function rt_repeat_exit to fix the repeat chain.

Logo:
for {var} {limits} {block}
Assembler:
 {limits.1}
  pop    eax
  push   eax
  push   eax
  push   [ebp-{offset}]
  call   deuse
  mov    eax,[esp]
  mov    [ebp-{offset}],eax
 {limits.2}
 {limits.3 or unbound}
  push   {addr of ste var's value}
  push   {source}
  call   rt_forchk
  cmp    [eax+ID],ERROR_ID
  je     exit
  push   eax
  mov    eax,[eax+REPLIMIT]
  call   deuse
  cmp    eax,0
  je     skip_for
  push   eax
  push   ebp
  call   rt_repeat_enter
repeat:

 {block}

  mov    eax,[ebp+REPEATCHAIN]
  mov    eax,[eax+CAR_OFFSET]
  inc    [eax+REPCOUNT]
  dec    [esp+REPLIMIT]
  jnz    repeat
skip_for:
  push   ebp
  call   rt_repeat_exit

Code for WHILE, DO.WHILE, UNTIL and DO.UNTIL

The while and until commands are compiled in a way to check the condition at the beginning of every loop. The code for the condition leaves the result in register EAX. The only difference between the code for while and for until is in one cmp instruction and in the label names. The generated code is:

Logo:
while {cond} {block}
or
until {cond} {block}
Assembler:
while:
 {cond}
  push  {source}
  call   rt_whlchk
  cmp    [eax+ID],ERROR_ID
  je     exit
  push   eax
  mov    eax,[eax+INT]
  call   deuse
  cmp    eax,0(for while) or 1(for until)
  je     skip
 {block}
  jmp    while
skip:
  push   (unbound)
  call   use
  pop    eax

The do.while and do.until commands are compiled in a similar way to while and until except that the block of commands is placed before the check of the condition. Thus the commands are executed at least once. The generated code is:

Logo:
do.while {block} {cond}
or
do.until {block} {cond}
Assembler:
do.while:
 {block}
 {cond}
  push  {source}
  call   rt_whlchk
  cmp    [eax+ID],ERROR_ID
  je     exit
  push   eax
  mov    eax,[eax+INT]
  call   deuse
  cmp    eax,0(for while) or 1(for until)
  je     skip
  jmp    while
skip:
  push   (unbound)
  call   use
  pop    eax

Code for CATCH

The catch command/function is compiled in a way to check the result of some executed commands and mask any exceptions if the tags match (the tag of the catch and the tag of the throw).

The code for catch has a special feature - the exit address is temporarily changed to point to the catch epilogue. This ensures that the epilogue is the exit point of any condition happening inside the catch's body.

The generated code for procedural catch is:

Logo:
catch {tag} {block}
Assembler:
  jmp    catch
exit:
  jmp    exit_catch
catch:
  {block}
exit_catch:
  push   eax
  {tag}
  call   rt_catchchk
  push   eax

The generated code for functional catch is:

Logo:
catch {tag} {block}
Assembler:
  jmp    catch
exit:
  jmp    exit_catch
catch:
  {block}
exit_catch:
  {tag}
  call   rt_catchchk
  push   eax

Code for GOTO

The goto command is compiled in two different ways depending on wheter the target tag is found at compile time. If it is found, then the generated code is:

Logo:
goto {tag}
Assembler:
  jmp    {tag}

If the target tag is an expression, the target is calculated at run-time. The value of the expression is a name of a local variable of type VAR_TYPE_TAG which VALUE is an integer atom. This atom is the target address relative to the beginning of the memory block of compiled code.

The generated code for goto which is expression is:

Logo:
goto {tag}
Assembler:
  push   {tag source}
 {tag}
  push   {parent}
  push   {static link}
  call   rt_goto
  pop    edx
  pop    edx
  call   deuse
  call   deuse
  cmp    [eax+ID],ERROR_ID
  je     exit
  push   eax
  mov    eax,[eax+INT]
  call   deuse
  add    eax,{mem block addr}
  jmp    eax

Code for IFTRUE and IFFALSE

The iftrue and iffalse commands test the input value of the latest executed local test command. The test command write 0 or 1 in the local stack at location [EBP+BASE_OFFSET_TEST]. 0 means the input of latest test was false. 1 means it was true.

In procedure prologue the value 2 is written in this location, thus initially it is neither true nor false.

Both commands iftrue and iffalse are compiled in the same way. The only difference is a constant to test with.

The generated code for iftrue is:

Logo:
iftrue {block}
Assembler:
  cmp    [ebp+BASE_OFFSET_TEST],1
  jne    skip
 {block}
skip:

The generated code for iffalse is:

Logo:
iffalse {block}
Assembler:
  cmp    [ebp+BASE_OFFSET_TEST],1
  jne    skip
 {block}
skip:

Code for EXTERNAL

When an empty Logo function is made external, this means that the body of the function is compiled in another language. The main issue is that the external procedure expects parameters to be in the processor's native data format.

Externalization of a Logo function builds a wrapper that converts Logo data for parameters into native format, then executes the external function and at the end converts its result back into Logo atom.

The wrapper clears a global error flag and checks it after the call of the external function. This flag is set by internal Lhogho function (if such is called from the external non-Lhogho function).

The generated code for wrapper of external functions is:

Logo:
to func :a :b
end

external "func [i4 func i4 i4] :lib
Assembler:
  push  ebp
  mov   ebp,esp

  ; clear error flag
  mov   [error_flag],0 

  push  [ebp+{ofs}]   ; push param 2
  call  atom_to_i4    ; convert atom to native
  {push native}       ; push param in native format

  push  [ebp+{ofs}]   ; push param 1
  call  atom_to_i4    ; convert atom to native
  {push native}       ; push param in native format

  call  {external_func}

  ; check error flag
  cmp   [error_flag.id],0
  je    $ok
  mov   eax,[error_flag]
  jmp   exit

ok:
  {push native}       ; push result in native format
  call  i4_to_atom    ; convert native to atom

exit:
  mov   esp,ebp
  pop   ebp
  ret

Pushing datum in native format is compiled in different ways depending on the size and the type of the datum. Integer values up to 32 bits (signed, unsigned, cardinal, enumerations, etc) are returned in EAX register and are pushed with this code:

Assembler:
  push   eax

Integer values larger then 32 bits and up to 64 bits are returned in EDX:EAX registers and are pushed with this code:

Assembler:
  push   edx
  push   eax

Floating point values up to 32 bits are returned in the stack of the FPU. The generated code reserves space in the normal stack with a push command, and then stores the floating-point value.

Assembler:
  push   eax
  fstp   [esp] ; 32-bit store

Floating point values from 33 up to 64 bits are processed in a similar way. The generated code is:

Assembler:
  push   edx
  push   eax
  fstp   [esp] ; 64-bit store

Note that the values pushed from EDX and EAX when the datun is floating point value are ignored and overwritten by the FSTP instruction.

Code for INTERNAL

When a Logo function is made internal, this means it can be called from a function compiled in another language. The main issue is that the internal procedure expects parameters to be atoms, which the non-Logo function will pass processor's native data format.

Internalization of a Logo function builds a wrapper that converts native parameters to Logo atoms, then executes the Logo function and at the end converts its result back into native format.

The generated code for wrapper of internal functions is:

Logo:
to func :a :b
   ...
   output :res
end

internal "func [i4 i4 i4]
Assembler:
  {prologue}          ; standard prologue code

  push  [ebp+{ofs}]   ; push native param 2
  call  i4_to_atom    ; convert native to atom
  push  eax           ; push atom

  push  [ebp+{ofs}]   ; push native param 1
  call  i4_to_atom    ; convert native to atom
  push  eax           ; push atom

  mov   [backup_frame],ebp  ; store frame
  mov   ebp,[root_frame]    ; load frame of root
  mov   esi,ebp             ; set static link = dynamic link
  call  {internal_func}
  mov   ebp,[backup_frame]  ; restore frame

  cmp   [eax+{id}],error
  je    $skip
  mov   [error_flag],eax
$skip:

  push  eax           ; push result atom
  call  atom_to_i4    ; convert atom_to_native

  call  deuse         ; free param 1
  call  deuse         ; free param 2

  {epilogue}          ; standard epilogue code
  ret

The wrapper for internal functions sets explicitely the frame pointer before calling the internal function. This is needed to make sure that the internal function can access Logo variables. If the frame pointer is not set, then the execution will be confused, because it will not understand the stack structure of the calling function (the calling function is written in another language).


[ HOME | INDEX | ATOMS | VARS | REFERENCE ]
Lhogho Developer's Documentation
Tue Feb 7 2012