23 KiB
Reef Compiler Guide
Quick reference for compiling to Reef bytecode.
Bytecode Formats
ReefVM supports two bytecode formats:
- String format: Human-readable text with opcodes and operands
- Array format: TypeScript arrays with typed tuples for programmatic generation
Both formats are compiled using the same toBytecode() function.
Bytecode Syntax
Instructions
OPCODE operand ; comment
Operand Types
Immediate numbers (#N): Counts or relative offsets
MAKE_ARRAY #3- count of 3 itemsJUMP #5- relative offset of 5 instructions (prefer labels)PUSH_TRY #10- absolute instruction index (prefer labels)
Labels (.name): Symbolic addresses resolved at parse time
.label:- define label at current positionJUMP .loop- jump to labelMAKE_FUNCTION (x) .body- function body at label
Variable names: Plain identifiers (supports Unicode and emoji!)
LOAD counter- load variableSTORE result- store variableLOAD 💎- load emoji variableSTORE 変数- store Unicode variable
Constants: Literals added to constants pool
- Numbers:
PUSH 42,PUSH 3.14 - Strings:
PUSH "hello"orPUSH 'world' - Booleans:
PUSH true,PUSH false - Null:
PUSH null
Array Format
The programmatic array format uses TypeScript tuples for type safety:
import { toBytecode, run } from "#reef"
const bytecode = toBytecode([
["PUSH", 42], // Atom values: number | string | boolean | null
["STORE", "x"], // Variable names as strings
["LOAD", "x"],
["HALT"]
])
const result = await run(bytecode)
Operand Types in Array Format
Atoms (number | string | boolean | null): Constants for PUSH
["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]
Variable names: String identifiers
["LOAD", "counter"]
["STORE", "result"]
Label definitions: Single-element arrays starting with . and ending with :
[".loop:"]
[".end:"]
[".function_body:"]
Label references: Strings in jump/function instructions
["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]
Counts: Numbers for array/dict construction
["MAKE_ARRAY", 3] // Pop 3 items
["MAKE_DICT", 2] // Pop 2 key-value pairs
Functions in Array Format
// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]
// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]
// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]
// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]
// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]
Complete Example
const factorial = toBytecode([
["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
["STORE", "factorial"],
["JUMP", ".main"],
[".fact:"],
["LOAD", "n"],
["PUSH", 0],
["LTE"],
["JUMP_IF_FALSE", ".recurse"],
["LOAD", "acc"],
["RETURN"],
[".recurse:"],
["LOAD", "factorial"],
["LOAD", "n"],
["PUSH", 1],
["SUB"],
["LOAD", "n"],
["LOAD", "acc"],
["MUL"],
["PUSH", 2],
["PUSH", 0],
["TAIL_CALL"],
[".main:"],
["LOAD", "factorial"],
["PUSH", 5],
["PUSH", 1],
["PUSH", 0],
["CALL"],
["HALT"]
])
const result = await run(factorial) // { type: "number", value: 120 }
String Format
Functions
MAKE_FUNCTION (x y) .body ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both
Function Calls
Stack order (bottom to top):
LOAD fn
PUSH arg1 ; Positional args
PUSH arg2
PUSH "name" ; Named arg key
PUSH "value" ; Named arg value
PUSH 2 ; Positional count
PUSH 1 ; Named count
CALL
Opcodes
Stack
PUSH <const>- Push constantPOP- Remove topDUP- Duplicate topSWAP- Swap top two valuesTYPE- Pop value, push its type as string
Variables
LOAD <name>- Push variable value (throws if not found)TRY_LOAD <name>- Push variable value if found, otherwise push name as string (never throws)STORE <name>- Pop and store in variable
Arithmetic
ADD,SUB,MUL,DIV,MOD- Binary ops (pop 2, push result)
Bitwise
BIT_AND,BIT_OR,BIT_XOR- Bitwise logical ops (pop 2, push result)BIT_SHL,BIT_SHR,BIT_USHR- Bitwise shift ops (pop 2, push result)
Comparison
EQ,NEQ,LT,GT,LTE,GTE- Pop 2, push boolean
Logic
NOT- Pop 1, push !value
Control Flow
JUMP .label- Unconditional jumpJUMP_IF_FALSE .label- Jump if top is false or null (pops value)JUMP_IF_TRUE .label- Jump if top is truthy (pops value)HALT- Stop execution of the program
Functions
MAKE_FUNCTION (params) .body- Create function, push to stackCALL- Call function (see calling convention above)TAIL_CALL- Tail-recursive call (no stack growth)RETURN- Return from function (pops return value)TRY_CALL <name>- Call function (if found), push value (if exists), or push name as string (if not found)BREAK- Exit iterator/loop (unwinds to break target)
Arrays
MAKE_ARRAY #N- Pop N items, push arrayARRAY_GET- Pop index and array, push elementARRAY_SET- Pop value, index, array; mutate arrayARRAY_PUSH- Pop value and array, append to arrayARRAY_LEN- Pop array, push length
Dicts
MAKE_DICT #N- Pop N key-value pairs, push dictDICT_GET- Pop key and dict, push value (or null)DICT_SET- Pop value, key, dict; mutate dictDICT_HAS- Pop key and dict, push boolean
Unified Access
DOT_GET- Pop index/key and array/dict, push value (null if missing)
Strings
STR_CONCAT #N- Pop N values, convert to strings, concatenate, push result
Exceptions
PUSH_TRY .catch- Register exception handlerPUSH_FINALLY .finally- Add finally to current handlerPOP_TRY- Remove handler (try succeeded)THROW- Throw exception (pops error value)
Compiler Patterns
Function Definitions
When defining functions, you must prevent the PC from "falling through" into the function body during sequential execution. There are two standard patterns:
Pattern 1: JUMP over function bodies (Recommended)
MAKE_FUNCTION (params) .body
STORE function_name
JUMP .end ; Skip over function body
.body:
<function code>
RETURN
.end:
<continue with program>
Pattern 2: Function bodies after HALT
MAKE_FUNCTION (params) .body
STORE function_name
<use the function>
HALT ; Stop execution before function bodies
.body:
<function code>
RETURN
Important: Pattern 2 only works if you HALT before reaching function bodies. Pattern 1 is more flexible and required for:
- Defining multiple functions before using them
- REPL mode (incremental execution)
- Any case where execution continues after defining a function
Why? MAKE_FUNCTION creates a function value but doesn't jump to the body—it just stores the body's address. Without JUMP or HALT, the PC increments into the function body and executes it as top-level code.
If-Else
<condition>
JUMP_IF_FALSE .else
<then-block>
JUMP .end
.else:
<else-block>
.end:
While Loop
.loop:
<condition>
JUMP_IF_FALSE .end
<body>
JUMP .loop
.end:
For Loop
<init>
.loop:
<condition>
JUMP_IF_FALSE .end
<body>
<increment>
JUMP .loop
.end:
Continue
No CONTINUE opcode. Use backward jump to loop start:
.loop:
<condition>
JUMP_IF_FALSE .end
<early-check>
JUMP_IF_TRUE .loop ; continue
<body>
JUMP .loop
.end:
Break in Loop
Mark iterator function as break target, use BREAK opcode:
MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT
.each_body:
<condition>
JUMP_IF_TRUE .done
<body>
BREAK ; exits to caller
.done:
RETURN
Short-Circuit AND
<left>
DUP
JUMP_IF_FALSE .end ; Short-circuit if false
POP
<right>
.end: ; Result on stack
Short-Circuit OR
<left>
DUP
JUMP_IF_TRUE .end ; Short-circuit if true
POP
<right>
.end: ; Result on stack
Reversing Operand Order
Use SWAP to reverse operand order for non-commutative operations:
; Compute 10 / 2 when values are in reverse order
PUSH 2
PUSH 10
SWAP ; Now: [10, 2]
DIV ; 10 / 2 = 5
; Compute "hello" - "world" (subtraction with strings coerced to numbers)
PUSH "world"
PUSH "hello"
SWAP ; Now: ["hello", "world"]
SUB ; Result based on operand order
Common Use Cases:
- Division and subtraction when operands are in wrong order
- String concatenation with specific order
- Preparing arguments for functions that care about position
Bitwise Operations
All bitwise operations work with 32-bit signed integers:
; Bitwise AND (masking)
PUSH 5
PUSH 3
BIT_AND ; → 1 (0101 & 0011 = 0001)
; Bitwise OR (combining flags)
PUSH 5
PUSH 3
BIT_OR ; → 7 (0101 | 0011 = 0111)
; Bitwise XOR (toggling bits)
PUSH 5
PUSH 3
BIT_XOR ; → 6 (0101 ^ 0011 = 0110)
; Left shift (multiply by power of 2)
PUSH 5
PUSH 2
BIT_SHL ; → 20 (5 << 2 = 5 * 4)
; Arithmetic right shift (divide by power of 2, preserves sign)
PUSH 20
PUSH 2
BIT_SHR ; → 5 (20 >> 2 = 20 / 4)
PUSH -20
PUSH 2
BIT_SHR ; → -5 (sign preserved)
; Logical right shift (zero-fill)
PUSH -1
PUSH 1
BIT_USHR ; → 2147483647 (unsigned shift)
Common Use Cases:
- Flags and bit masks:
flags band MASKto test,flags bor FLAGto set - Fast multiplication/division by powers of 2
- Color manipulation: extract RGB components
- Low-level bit manipulation for protocols or file formats
Runtime Type Checking (TYPE)
Get the type of a value as a string for runtime introspection:
; Basic type check
PUSH 42
TYPE ; → "number"
PUSH "hello"
TYPE ; → "string"
MAKE_ARRAY #3
TYPE ; → "array"
Type Guard Pattern (check type before operation):
; Safe addition - only add if both are numbers
LOAD x
DUP
TYPE
PUSH "number"
EQ
JUMP_IF_FALSE .not_number
LOAD y
DUP
TYPE
PUSH "number"
EQ
JUMP_IF_FALSE .cleanup_not_number
ADD ; Safe to add
JUMP .end
.cleanup_not_number:
POP ; Remove y
.not_number:
POP ; Remove x
PUSH null
.end:
Common Use Cases:
- Type validation before operations
- Polymorphic functions that handle multiple types
- Debugging and introspection
- Dynamic dispatch in DSLs
- Safe coercion with fallbacks
Try-Catch
PUSH_TRY .catch
<try-block>
POP_TRY
JUMP .end
.catch:
STORE err
<catch-block>
.end:
Try-Catch-Finally
PUSH_TRY .catch
PUSH_FINALLY .finally
<try-block>
POP_TRY
JUMP .finally ; Compiler must generate this
.catch:
STORE err
<catch-block>
JUMP .finally ; And this
.finally:
<finally-block> ; Executes in both paths
.end:
Important: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.
Closures
Functions automatically capture current scope:
PUSH 0
STORE counter
MAKE_FUNCTION () .increment
STORE increment_fn
JUMP .main
.increment:
LOAD counter ; Captured variable
PUSH 1
ADD
STORE counter
LOAD counter
RETURN
.main:
LOAD increment_fn
PUSH 0
PUSH 0
CALL ; Returns 1
POP
LOAD increment_fn
PUSH 0
PUSH 0
CALL ; Returns 2 (counter persists!)
HALT
Tail Recursion
Use TAIL_CALL instead of CALL for last call:
MAKE_FUNCTION (n acc) .factorial
STORE factorial
JUMP .main
.factorial:
LOAD n
PUSH 0
LTE
JUMP_IF_FALSE .recurse
LOAD acc
RETURN
.recurse:
LOAD factorial
LOAD n
PUSH 1
SUB
LOAD n
LOAD acc
MUL
PUSH 2
PUSH 0
TAIL_CALL ; Reuses stack frame
.main:
LOAD factorial
PUSH 5
PUSH 1
PUSH 2
PUSH 0
CALL ; factorial(5, 1) = 120
HALT
Optional Function Calls (TRY_CALL)
Call function if defined, otherwise use value or name as string:
; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit
; Later: call if defined, skip if not
TRY_CALL onInit ; Calls onInit() if it's a function
; Pushes value if it exists but isn't a function
; Pushes "onInit" as string if undefined
; Use with values
PUSH 42
STORE answer
TRY_CALL answer ; Pushes 42 (not a function)
; Use with undefined
TRY_CALL unknown ; Pushes "unknown" as string
Use Cases:
- Optional hooks/callbacks in DSLs
- Shell-like languages where unknown identifiers become strings
- Templating systems with optional transformers
String Concatenation
Build strings from multiple values:
; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3 ; → "Hello World"
; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2 ; → "Name: Alice"
; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2 ; → "Result: 15"
; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5 ; → "User 42 has 3 items"
Composability: Results can be concatenated again
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2 ; → "Hello World!"
Unified Access (DOT_GET)
DOT_GET provides a single opcode for accessing both arrays and dicts:
; Array access
PUSH 10
PUSH 20
PUSH 30
MAKE_ARRAY #3
PUSH 1
DOT_GET ; → 20
; Dict access
PUSH 'name'
PUSH 'Alice'
MAKE_DICT #1
PUSH 'name'
DOT_GET ; → 'Alice'
Chained access:
; Access dict['users'][0]['name']
LOAD dict
PUSH 'users'
DOT_GET ; Get users array
PUSH 0
DOT_GET ; Get first user
PUSH 'name'
DOT_GET ; Get name field
With variables:
LOAD data
LOAD key ; Key can be string or number
DOT_GET ; Works for both array and dict
Null safety: Returns null for missing keys or out-of-bounds indices
MAKE_ARRAY #0
PUSH 0
DOT_GET ; → null (empty array)
MAKE_DICT #0
PUSH 'key'
DOT_GET ; → null (missing key)
Key Concepts
Truthiness
Only null and false are falsy. Everything else (including 0, "", empty arrays/dicts) is truthy.
Type Coercion
toNumber:
number→ identitystring→ parseFloat (or 0 if invalid)boolean→ 1 (true) or 0 (false)null→ 0- Others → 0
toString:
string→ identitynumber→ string representationboolean→ "true" or "false"null→ "null"function→ ""array→ "[item, item]"dict→ "{key: value, ...}"
Arithmetic ops (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.
Bitwise ops (BIT_AND, BIT_OR, BIT_XOR, BIT_SHL, BIT_SHR, BIT_USHR) coerce both operands to 32-bit signed integers.
Comparison ops (LT, GT, LTE, GTE) coerce both operands to numbers.
Equality ops (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.
Note: There is no string concatenation operator. ADD only works with numbers.
Scope
- Variables resolved through parent scope chain
- STORE updates existing variable or creates in current scope
- Functions capture scope at definition time
Identifiers
Variable and function parameter names support Unicode and emoji:
- Valid:
💎,🌟,変数,counter,_private - Invalid: Cannot start with digits,
.,#,@, or... - Invalid: Cannot contain whitespace or special chars:
;,(),[],{},=,',"
Break Semantics
- CALL marks current frame as break target
- BREAK unwinds call stack to that target
- Used for Ruby-style iterator pattern
Parameter Binding Priority
For function calls, parameters bound in order:
- Positional argument (if provided)
- Named argument (if provided and matches param name)
- Default value (if defined)
- Null
Exception Handlers
- PUSH_TRY uses absolute addresses for catch blocks
- Nested try blocks form a stack
- THROW unwinds to most recent handler and jumps to finally (if present) or catch
- VM does NOT automatically jump to finally on success - compiler must generate JUMPs
- Finally execution in all cases is compiler's responsibility, not VM's
Calling Convention
All calls (including native functions) push arguments in order:
- Function
- Positional args (in order)
- Named args (key1, val1, key2, val2, ...)
- Positional count (as number)
- Named count (as number)
- CALL or TAIL_CALL
Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL.
Registering Native Functions
Native TypeScript functions are registered into the VM's scope and accessed like regular variables.
Method 1: Pass to run() or VM constructor
const result = await run(bytecode, {
add: (a: number, b: number) => a + b,
greet: (name: string) => `Hello, ${name}!`
})
// Or with VM
const vm = new VM(bytecode, { add, greet })
Method 2: Register after construction
const vm = new VM(bytecode)
vm.set('add', (a: number, b: number) => a + b)
await vm.run()
Method 3: Value-based functions (for full control)
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
Auto-wrapping: vm.set() automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work.
Usage in bytecode:
; Positional arguments
LOAD add ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2 ; positionalCount
PUSH 0 ; namedCount
CALL ; Call like any other function
; Named arguments
LOAD greet
PUSH "name"
PUSH "Alice"
PUSH "greeting"
PUSH "Hi"
PUSH 0 ; positionalCount
PUSH 2 ; namedCount
CALL ; → "Hi, Alice!"
Named Arguments: Native functions support named arguments. Parameter names are extracted from the function signature at call time, and arguments are bound using the same priority as Reef functions (named arg > positional arg > default > null).
@named Pattern: Parameters starting with at followed by an uppercase letter (e.g., atOptions, atNamed) collect unmatched named arguments:
// Basic @named - collects all named args
vm.set('greet', (atNamed: any = {}) => {
return `Hello, ${atNamed.name || 'World'}!`
})
// Mixed positional and @named
vm.set('configure', (name: string, atOptions: any = {}) => {
return {
name,
debug: atOptions.debug || false,
port: atOptions.port || 3000
}
})
Bytecode example:
; Call with mixed positional and named args
LOAD configure
PUSH "myApp" ; positional arg → name
PUSH "debug"
PUSH true
PUSH "port"
PUSH 8080
PUSH 1 ; 1 positional arg
PUSH 2 ; 2 named args (debug, port)
CALL ; atOptions receives {debug: true, port: 8080}
Named arguments that match fixed parameter names are bound to those parameters. Remaining unmatched named arguments are collected into the atXxx parameter as a plain JavaScript object.
Calling Functions from TypeScript
You can call both Reef and native functions from TypeScript using vm.call():
const bytecode = toBytecode(`
MAKE_FUNCTION (name greeting="Hello") .greet
STORE greet
HALT
.greet:
LOAD greeting
PUSH " "
LOAD name
PUSH "!"
STR_CONCAT #4
RETURN
`)
const vm = new VM(bytecode, {
log: (msg: string) => console.log(msg) // Native function
})
await vm.run()
// Call Reef function with positional arguments
const result1 = await vm.call('greet', 'Alice')
// Returns: "Hello Alice!"
// Call Reef function with named arguments (pass as final object)
const result2 = await vm.call('greet', 'Bob', { greeting: 'Hi' })
// Returns: "Hi Bob!"
// Call Reef function with only named arguments
const result3 = await vm.call('greet', { name: 'Carol', greeting: 'Hey' })
// Returns: "Hey Carol!"
// Call native function
await vm.call('log', 'Hello from TypeScript!')
How it works:
vm.call(functionName, ...args)looks up the function (Reef or native) in the VM's scope- For Reef functions: converts to callable JavaScript function
- For native functions: calls directly
- Arguments are automatically converted to ReefVM Values
- Returns the result (automatically converted back to JavaScript types)
Named arguments: Pass a plain object as the final argument to provide named arguments. If the last argument is a non-array object, it's treated as named arguments. All preceding arguments are treated as positional.
Type conversion: Arguments and return values are automatically converted between JavaScript types and ReefVM Values:
- Primitives:
number,string,boolean,null - Arrays: converted recursively
- Objects: converted to ReefVM dicts
- Functions: Reef functions are converted to callable JavaScript functions
REPL Mode (Incremental Compilation)
ReefVM supports incremental bytecode execution for building REPLs. This allows you to execute code line-by-line while preserving scope and avoiding re-execution of side effects.
The Problem: By default, vm.run() resets the program counter (PC) to 0, re-executing all previous bytecode. This makes it impossible to implement a REPL where each line executes only once.
The Solution: Use vm.continue() to resume execution from where you left off:
// Line 1: Define variable
const line1 = toBytecode([
["PUSH", 42],
["STORE", "x"]
])
const vm = new VM(line1)
await vm.run() // Execute first line
// Line 2: Use the variable
const line2 = toBytecode([
["LOAD", "x"],
["PUSH", 10],
["ADD"]
])
vm.appendBytecode(line2) // Append new bytecode with proper constant remapping
await vm.continue() // Execute ONLY the new bytecode
// Result: 52 (42 + 10)
// The first line never re-executed!
Key methods:
vm.run(): Resets PC to 0 and runs from the beginning (normal execution)vm.continue(): Continues from current PC (REPL mode)vm.appendBytecode(bytecode): Helper that properly appends bytecode with constant index remapping
Important: Don't use HALT in REPL mode! The VM naturally stops when it runs out of instructions. Using HALT sets vm.stopped = true, which prevents continue() from resuming.
Example REPL pattern:
const vm = new VM(toBytecode([]), { /* native functions */ })
while (true) {
const input = await getUserInput() // Get next line from user
const bytecode = compileLine(input) // Compile to bytecode (no HALT!)
vm.appendBytecode(bytecode) // Append to VM
const result = await vm.continue() // Execute only the new code
console.log(fromValue(result)) // Show result to user
}
This pattern ensures:
- Variables persist between lines
- Side effects (like
echoor function calls) only run once - Previous bytecode never re-executes
- Scope accumulates across all lines
Empty Stack
- RETURN with empty stack returns null
- HALT with empty stack returns null