ReefVM/GUIDE.md
2025-11-08 00:01:21 -08:00

23 KiB

Reef Compiler Guide

Quick reference for compiling to Reef bytecode.

Bytecode Formats

ReefVM supports two bytecode formats:

  1. String format: Human-readable text with opcodes and operands
  2. Array format: TypeScript arrays with typed tuples for programmatic generation

Both formats are compiled using the same toBytecode() function.

Bytecode Syntax

Instructions

OPCODE operand     ; comment

Operand Types

Immediate numbers (#N): Counts or relative offsets

  • MAKE_ARRAY #3 - count of 3 items
  • JUMP #5 - relative offset of 5 instructions (prefer labels)
  • PUSH_TRY #10 - absolute instruction index (prefer labels)

Labels (.name): Symbolic addresses resolved at parse time

  • .label: - define label at current position
  • JUMP .loop - jump to label
  • MAKE_FUNCTION (x) .body - function body at label

Variable names: Plain identifiers (supports Unicode and emoji!)

  • LOAD counter - load variable
  • STORE result - store variable
  • LOAD 💎 - load emoji variable
  • STORE 変数 - store Unicode variable

Constants: Literals added to constants pool

  • Numbers: PUSH 42, PUSH 3.14
  • Strings: PUSH "hello" or PUSH 'world'
  • Booleans: PUSH true, PUSH false
  • Null: PUSH null

Array Format

The programmatic array format uses TypeScript tuples for type safety:

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],        // Atom values: number | string | boolean | null
  ["STORE", "x"],      // Variable names as strings
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)

Operand Types in Array Format

Atoms (number | string | boolean | null): Constants for PUSH

["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]

Variable names: String identifiers

["LOAD", "counter"]
["STORE", "result"]

Label definitions: Single-element arrays starting with . and ending with :

[".loop:"]
[".end:"]
[".function_body:"]

Label references: Strings in jump/function instructions

["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]

Counts: Numbers for array/dict construction

["MAKE_ARRAY", 3]    // Pop 3 items
["MAKE_DICT", 2]     // Pop 2 key-value pairs

Functions in Array Format

// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]

// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]

// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]

// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]

// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]

Complete Example

const factorial = toBytecode([
  ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
  ["STORE", "factorial"],
  ["JUMP", ".main"],

  [".fact:"],
  ["LOAD", "n"],
  ["PUSH", 0],
  ["LTE"],
  ["JUMP_IF_FALSE", ".recurse"],
  ["LOAD", "acc"],
  ["RETURN"],

  [".recurse:"],
  ["LOAD", "factorial"],
  ["LOAD", "n"],
  ["PUSH", 1],
  ["SUB"],
  ["LOAD", "n"],
  ["LOAD", "acc"],
  ["MUL"],
  ["PUSH", 2],
  ["PUSH", 0],
  ["TAIL_CALL"],

  [".main:"],
  ["LOAD", "factorial"],
  ["PUSH", 5],
  ["PUSH", 1],
  ["PUSH", 0],
  ["CALL"],
  ["HALT"]
])

const result = await run(factorial)  // { type: "number", value: 120 }

String Format

Functions

MAKE_FUNCTION (x y) .body       ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body  ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both

Function Calls

Stack order (bottom to top):

LOAD fn
PUSH arg1           ; Positional args
PUSH arg2
PUSH "name"         ; Named arg key
PUSH "value"        ; Named arg value
PUSH 2              ; Positional count
PUSH 1              ; Named count
CALL

Opcodes

Stack

  • PUSH <const> - Push constant
  • POP - Remove top
  • DUP - Duplicate top
  • SWAP - Swap top two values
  • TYPE - Pop value, push its type as string

Variables

  • LOAD <name> - Push variable value (throws if not found)
  • TRY_LOAD <name> - Push variable value if found, otherwise push name as string (never throws)
  • STORE <name> - Pop and store in variable

Arithmetic

  • ADD, SUB, MUL, DIV, MOD - Binary ops (pop 2, push result)

Bitwise

  • BIT_AND, BIT_OR, BIT_XOR - Bitwise logical ops (pop 2, push result)
  • BIT_SHL, BIT_SHR, BIT_USHR - Bitwise shift ops (pop 2, push result)

Comparison

  • EQ, NEQ, LT, GT, LTE, GTE - Pop 2, push boolean

Logic

  • NOT - Pop 1, push !value

Control Flow

  • JUMP .label - Unconditional jump
  • JUMP_IF_FALSE .label - Jump if top is false or null (pops value)
  • JUMP_IF_TRUE .label - Jump if top is truthy (pops value)
  • HALT - Stop execution of the program

Functions

  • MAKE_FUNCTION (params) .body - Create function, push to stack
  • CALL - Call function (see calling convention above)
  • TAIL_CALL - Tail-recursive call (no stack growth)
  • RETURN - Return from function (pops return value)
  • TRY_CALL <name> - Call function (if found), push value (if exists), or push name as string (if not found)
  • BREAK - Exit iterator/loop (unwinds to break target)

Arrays

  • MAKE_ARRAY #N - Pop N items, push array
  • ARRAY_GET - Pop index and array, push element
  • ARRAY_SET - Pop value, index, array; mutate array
  • ARRAY_PUSH - Pop value and array, append to array
  • ARRAY_LEN - Pop array, push length

Dicts

  • MAKE_DICT #N - Pop N key-value pairs, push dict
  • DICT_GET - Pop key and dict, push value (or null)
  • DICT_SET - Pop value, key, dict; mutate dict
  • DICT_HAS - Pop key and dict, push boolean

Unified Access

  • DOT_GET - Pop index/key and array/dict, push value (null if missing)

Strings

  • STR_CONCAT #N - Pop N values, convert to strings, concatenate, push result

Exceptions

  • PUSH_TRY .catch - Register exception handler
  • PUSH_FINALLY .finally - Add finally to current handler
  • POP_TRY - Remove handler (try succeeded)
  • THROW - Throw exception (pops error value)

Compiler Patterns

Function Definitions

When defining functions, you must prevent the PC from "falling through" into the function body during sequential execution. There are two standard patterns:

Pattern 1: JUMP over function bodies (Recommended)

MAKE_FUNCTION (params) .body
STORE function_name
JUMP .end              ; Skip over function body
.body:
  <function code>
  RETURN
.end:
  <continue with program>

Pattern 2: Function bodies after HALT

MAKE_FUNCTION (params) .body
STORE function_name
<use the function>
HALT                   ; Stop execution before function bodies
.body:
  <function code>
  RETURN

Important: Pattern 2 only works if you HALT before reaching function bodies. Pattern 1 is more flexible and required for:

  • Defining multiple functions before using them
  • REPL mode (incremental execution)
  • Any case where execution continues after defining a function

Why? MAKE_FUNCTION creates a function value but doesn't jump to the body—it just stores the body's address. Without JUMP or HALT, the PC increments into the function body and executes it as top-level code.

If-Else

<condition>
JUMP_IF_FALSE .else
  <then-block>
  JUMP .end
.else:
  <else-block>
.end:

While Loop

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  JUMP .loop
.end:

For Loop

<init>
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  <increment>
  JUMP .loop
.end:

Continue

No CONTINUE opcode. Use backward jump to loop start:

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <early-check>
  JUMP_IF_TRUE .loop    ; continue
  <body>
  JUMP .loop
.end:

Break in Loop

Mark iterator function as break target, use BREAK opcode:

MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT

.each_body:
  <condition>
  JUMP_IF_TRUE .done
  <body>
  BREAK                  ; exits to caller
.done:
  RETURN

Short-Circuit AND

<left>
DUP
JUMP_IF_FALSE .end      ; Short-circuit if false
POP
<right>
.end:                    ; Result on stack

Short-Circuit OR

<left>
DUP
JUMP_IF_TRUE .end       ; Short-circuit if true
POP
<right>
.end:                    ; Result on stack

Reversing Operand Order

Use SWAP to reverse operand order for non-commutative operations:

; Compute 10 / 2 when values are in reverse order
PUSH 2
PUSH 10
SWAP                     ; Now: [10, 2]
DIV                      ; 10 / 2 = 5
; Compute "hello" - "world" (subtraction with strings coerced to numbers)
PUSH "world"
PUSH "hello"
SWAP                     ; Now: ["hello", "world"]
SUB                      ; Result based on operand order

Common Use Cases:

  • Division and subtraction when operands are in wrong order
  • String concatenation with specific order
  • Preparing arguments for functions that care about position

Bitwise Operations

All bitwise operations work with 32-bit signed integers:

; Bitwise AND (masking)
PUSH 5
PUSH 3
BIT_AND                  ; → 1 (0101 & 0011 = 0001)

; Bitwise OR (combining flags)
PUSH 5
PUSH 3
BIT_OR                   ; → 7 (0101 | 0011 = 0111)

; Bitwise XOR (toggling bits)
PUSH 5
PUSH 3
BIT_XOR                  ; → 6 (0101 ^ 0011 = 0110)

; Left shift (multiply by power of 2)
PUSH 5
PUSH 2
BIT_SHL                  ; → 20 (5 << 2 = 5 * 4)

; Arithmetic right shift (divide by power of 2, preserves sign)
PUSH 20
PUSH 2
BIT_SHR                  ; → 5 (20 >> 2 = 20 / 4)

PUSH -20
PUSH 2
BIT_SHR                  ; → -5 (sign preserved)

; Logical right shift (zero-fill)
PUSH -1
PUSH 1
BIT_USHR                 ; → 2147483647 (unsigned shift)

Common Use Cases:

  • Flags and bit masks: flags band MASK to test, flags bor FLAG to set
  • Fast multiplication/division by powers of 2
  • Color manipulation: extract RGB components
  • Low-level bit manipulation for protocols or file formats

Runtime Type Checking (TYPE)

Get the type of a value as a string for runtime introspection:

; Basic type check
PUSH 42
TYPE                     ; → "number"

PUSH "hello"
TYPE                     ; → "string"

MAKE_ARRAY #3
TYPE                     ; → "array"

Type Guard Pattern (check type before operation):

; Safe addition - only add if both are numbers
LOAD x
DUP
TYPE
PUSH "number"
EQ
JUMP_IF_FALSE .not_number

LOAD y
DUP
TYPE
PUSH "number"
EQ
JUMP_IF_FALSE .cleanup_not_number

ADD                      ; Safe to add
JUMP .end

.cleanup_not_number:
  POP                    ; Remove y
.not_number:
  POP                    ; Remove x
  PUSH null
.end:

Common Use Cases:

  • Type validation before operations
  • Polymorphic functions that handle multiple types
  • Debugging and introspection
  • Dynamic dispatch in DSLs
  • Safe coercion with fallbacks

Try-Catch

PUSH_TRY .catch
  <try-block>
  POP_TRY
  JUMP .end
.catch:
  STORE err
  <catch-block>
.end:

Try-Catch-Finally

PUSH_TRY .catch
PUSH_FINALLY .finally
  <try-block>
  POP_TRY
  JUMP .finally         ; Compiler must generate this
.catch:
  STORE err
  <catch-block>
  JUMP .finally         ; And this
.finally:
  <finally-block>       ; Executes in both paths
.end:

Important: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.

Closures

Functions automatically capture current scope:

PUSH 0
STORE counter
MAKE_FUNCTION () .increment
STORE increment_fn
JUMP .main

.increment:
  LOAD counter          ; Captured variable
  PUSH 1
  ADD
  STORE counter
  LOAD counter
  RETURN

.main:
  LOAD increment_fn
  PUSH 0
  PUSH 0
  CALL                  ; Returns 1
  POP
  LOAD increment_fn
  PUSH 0
  PUSH 0
  CALL                  ; Returns 2 (counter persists!)
  HALT

Tail Recursion

Use TAIL_CALL instead of CALL for last call:

MAKE_FUNCTION (n acc) .factorial
STORE factorial
JUMP .main

.factorial:
  LOAD n
  PUSH 0
  LTE
  JUMP_IF_FALSE .recurse
  LOAD acc
  RETURN
.recurse:
  LOAD factorial
  LOAD n
  PUSH 1
  SUB
  LOAD n
  LOAD acc
  MUL
  PUSH 2
  PUSH 0
  TAIL_CALL             ; Reuses stack frame

.main:
  LOAD factorial
  PUSH 5
  PUSH 1
  PUSH 2
  PUSH 0
  CALL                  ; factorial(5, 1) = 120
  HALT

Optional Function Calls (TRY_CALL)

Call function if defined, otherwise use value or name as string:

; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit

; Later: call if defined, skip if not
TRY_CALL onInit       ; Calls onInit() if it's a function
                       ; Pushes value if it exists but isn't a function
                       ; Pushes "onInit" as string if undefined

; Use with values
PUSH 42
STORE answer
TRY_CALL answer        ; Pushes 42 (not a function)

; Use with undefined
TRY_CALL unknown       ; Pushes "unknown" as string

Use Cases:

  • Optional hooks/callbacks in DSLs
  • Shell-like languages where unknown identifiers become strings
  • Templating systems with optional transformers

String Concatenation

Build strings from multiple values:

; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3           ; → "Hello World"

; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2           ; → "Name: Alice"

; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2           ; → "Result: 15"

; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5           ; → "User 42 has 3 items"

Composability: Results can be concatenated again

PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2           ; → "Hello World!"

Unified Access (DOT_GET)

DOT_GET provides a single opcode for accessing both arrays and dicts:

; Array access
PUSH 10
PUSH 20
PUSH 30
MAKE_ARRAY #3
PUSH 1
DOT_GET                 ; → 20

; Dict access
PUSH 'name'
PUSH 'Alice'
MAKE_DICT #1
PUSH 'name'
DOT_GET                 ; → 'Alice'

Chained access:

; Access dict['users'][0]['name']
LOAD dict
PUSH 'users'
DOT_GET                 ; Get users array
PUSH 0
DOT_GET                 ; Get first user
PUSH 'name'
DOT_GET                 ; Get name field

With variables:

LOAD data
LOAD key                ; Key can be string or number
DOT_GET                 ; Works for both array and dict

Null safety: Returns null for missing keys or out-of-bounds indices

MAKE_ARRAY #0
PUSH 0
DOT_GET                 ; → null (empty array)

MAKE_DICT #0
PUSH 'key'
DOT_GET                 ; → null (missing key)

Key Concepts

Truthiness

Only null and false are falsy. Everything else (including 0, "", empty arrays/dicts) is truthy.

Type Coercion

toNumber:

  • number → identity
  • string → parseFloat (or 0 if invalid)
  • boolean → 1 (true) or 0 (false)
  • null → 0
  • Others → 0

toString:

  • string → identity
  • number → string representation
  • boolean → "true" or "false"
  • null → "null"
  • function → ""
  • array → "[item, item]"
  • dict → "{key: value, ...}"

Arithmetic ops (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.

Bitwise ops (BIT_AND, BIT_OR, BIT_XOR, BIT_SHL, BIT_SHR, BIT_USHR) coerce both operands to 32-bit signed integers.

Comparison ops (LT, GT, LTE, GTE) coerce both operands to numbers.

Equality ops (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.

Note: There is no string concatenation operator. ADD only works with numbers.

Scope

  • Variables resolved through parent scope chain
  • STORE updates existing variable or creates in current scope
  • Functions capture scope at definition time

Identifiers

Variable and function parameter names support Unicode and emoji:

  • Valid: 💎, 🌟, 変数, counter, _private
  • Invalid: Cannot start with digits, ., #, @, or ...
  • Invalid: Cannot contain whitespace or special chars: ;, (), [], {}, =, ', "

Break Semantics

  • CALL marks current frame as break target
  • BREAK unwinds call stack to that target
  • Used for Ruby-style iterator pattern

Parameter Binding Priority

For function calls, parameters bound in order:

  1. Positional argument (if provided)
  2. Named argument (if provided and matches param name)
  3. Default value (if defined)
  4. Null

Exception Handlers

  • PUSH_TRY uses absolute addresses for catch blocks
  • Nested try blocks form a stack
  • THROW unwinds to most recent handler and jumps to finally (if present) or catch
  • VM does NOT automatically jump to finally on success - compiler must generate JUMPs
  • Finally execution in all cases is compiler's responsibility, not VM's

Calling Convention

All calls (including native functions) push arguments in order:

  1. Function
  2. Positional args (in order)
  3. Named args (key1, val1, key2, val2, ...)
  4. Positional count (as number)
  5. Named count (as number)
  6. CALL or TAIL_CALL

Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL.

Registering Native Functions

Native TypeScript functions are registered into the VM's scope and accessed like regular variables.

Method 1: Pass to run() or VM constructor

const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or with VM
const vm = new VM(bytecode, { add, greet })

Method 2: Register after construction

const vm = new VM(bytecode)
vm.set('add', (a: number, b: number) => a + b)
await vm.run()

Method 3: Value-based functions (for full control)

vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return { type: 'number', value: toNumber(a) + toNumber(b) }
})

Auto-wrapping: vm.set() automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work.

Usage in bytecode:

; Positional arguments
LOAD add              ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2                ; positionalCount
PUSH 0                ; namedCount
CALL                  ; Call like any other function

; Named arguments
LOAD greet
PUSH "name"
PUSH "Alice"
PUSH "greeting"
PUSH "Hi"
PUSH 0                ; positionalCount
PUSH 2                ; namedCount
CALL                  ; → "Hi, Alice!"

Named Arguments: Native functions support named arguments. Parameter names are extracted from the function signature at call time, and arguments are bound using the same priority as Reef functions (named arg > positional arg > default > null).

@named Pattern: Parameters starting with at followed by an uppercase letter (e.g., atOptions, atNamed) collect unmatched named arguments:

// Basic @named - collects all named args
vm.set('greet', (atNamed: any = {}) => {
  return `Hello, ${atNamed.name || 'World'}!`
})

// Mixed positional and @named
vm.set('configure', (name: string, atOptions: any = {}) => {
  return {
    name,
    debug: atOptions.debug || false,
    port: atOptions.port || 3000
  }
})

Bytecode example:

; Call with mixed positional and named args
LOAD configure
PUSH "myApp"        ; positional arg → name
PUSH "debug"
PUSH true
PUSH "port"
PUSH 8080
PUSH 1              ; 1 positional arg
PUSH 2              ; 2 named args (debug, port)
CALL                ; atOptions receives {debug: true, port: 8080}

Named arguments that match fixed parameter names are bound to those parameters. Remaining unmatched named arguments are collected into the atXxx parameter as a plain JavaScript object.

Calling Functions from TypeScript

You can call both Reef and native functions from TypeScript using vm.call():

const bytecode = toBytecode(`
  MAKE_FUNCTION (name greeting="Hello") .greet
  STORE greet
  HALT

  .greet:
    LOAD greeting
    PUSH " "
    LOAD name
    PUSH "!"
    STR_CONCAT #4
    RETURN
`)

const vm = new VM(bytecode, {
  log: (msg: string) => console.log(msg)  // Native function
})
await vm.run()

// Call Reef function with positional arguments
const result1 = await vm.call('greet', 'Alice')
// Returns: "Hello Alice!"

// Call Reef function with named arguments (pass as final object)
const result2 = await vm.call('greet', 'Bob', { greeting: 'Hi' })
// Returns: "Hi Bob!"

// Call Reef function with only named arguments
const result3 = await vm.call('greet', { name: 'Carol', greeting: 'Hey' })
// Returns: "Hey Carol!"

// Call native function
await vm.call('log', 'Hello from TypeScript!')

How it works:

  • vm.call(functionName, ...args) looks up the function (Reef or native) in the VM's scope
  • For Reef functions: converts to callable JavaScript function
  • For native functions: calls directly
  • Arguments are automatically converted to ReefVM Values
  • Returns the result (automatically converted back to JavaScript types)

Named arguments: Pass a plain object as the final argument to provide named arguments. If the last argument is a non-array object, it's treated as named arguments. All preceding arguments are treated as positional.

Type conversion: Arguments and return values are automatically converted between JavaScript types and ReefVM Values:

  • Primitives: number, string, boolean, null
  • Arrays: converted recursively
  • Objects: converted to ReefVM dicts
  • Functions: Reef functions are converted to callable JavaScript functions

REPL Mode (Incremental Compilation)

ReefVM supports incremental bytecode execution for building REPLs. This allows you to execute code line-by-line while preserving scope and avoiding re-execution of side effects.

The Problem: By default, vm.run() resets the program counter (PC) to 0, re-executing all previous bytecode. This makes it impossible to implement a REPL where each line executes only once.

The Solution: Use vm.continue() to resume execution from where you left off:

// Line 1: Define variable
const line1 = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"]
])

const vm = new VM(line1)
await vm.run()  // Execute first line

// Line 2: Use the variable
const line2 = toBytecode([
  ["LOAD", "x"],
  ["PUSH", 10],
  ["ADD"]
])

vm.appendBytecode(line2)  // Append new bytecode with proper constant remapping
await vm.continue()       // Execute ONLY the new bytecode

// Result: 52 (42 + 10)
// The first line never re-executed!

Key methods:

  • vm.run(): Resets PC to 0 and runs from the beginning (normal execution)
  • vm.continue(): Continues from current PC (REPL mode)
  • vm.appendBytecode(bytecode): Helper that properly appends bytecode with constant index remapping

Important: Don't use HALT in REPL mode! The VM naturally stops when it runs out of instructions. Using HALT sets vm.stopped = true, which prevents continue() from resuming.

Example REPL pattern:

const vm = new VM(toBytecode([]), { /* native functions */ })

while (true) {
  const input = await getUserInput()  // Get next line from user
  const bytecode = compileLine(input)  // Compile to bytecode (no HALT!)

  vm.appendBytecode(bytecode)  // Append to VM
  const result = await vm.continue()  // Execute only the new code

  console.log(fromValue(result))  // Show result to user
}

This pattern ensures:

  • Variables persist between lines
  • Side effects (like echo or function calls) only run once
  • Previous bytecode never re-executes
  • Scope accumulates across all lines

Empty Stack

  • RETURN with empty stack returns null
  • HALT with empty stack returns null