probablycorey/ReefVM

Fork 0

forked from defunkt/ReefVM

Chris Wanstrath 1fb5effb0a add REPL support

2025-10-25 09:10:43 -07:00

19 KiB

Raw Blame History

Reef Compiler Guide

Quick reference for compiling to Reef bytecode.

Bytecode Formats

ReefVM supports two bytecode formats:

String format: Human-readable text with opcodes and operands
Array format: TypeScript arrays with typed tuples for programmatic generation

Both formats are compiled using the same toBytecode() function.

Bytecode Syntax

Instructions

OPCODE operand     ; comment

Operand Types

Immediate numbers (#N): Counts or relative offsets

MAKE_ARRAY #3 - count of 3 items
JUMP #5 - relative offset of 5 instructions (prefer labels)
PUSH_TRY #10 - absolute instruction index (prefer labels)

Labels (.name): Symbolic addresses resolved at parse time

.label: - define label at current position
JUMP .loop - jump to label
MAKE_FUNCTION (x) .body - function body at label

Variable names: Plain identifiers (supports Unicode and emoji!)

LOAD counter - load variable
STORE result - store variable
LOAD 💎 - load emoji variable
STORE 変数 - store Unicode variable

Constants: Literals added to constants pool

Numbers: PUSH 42, PUSH 3.14
Strings: PUSH "hello" or PUSH 'world'
Booleans: PUSH true, PUSH false
Null: PUSH null

Array Format

The programmatic array format uses TypeScript tuples for type safety:

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],        // Atom values: number | string | boolean | null
  ["STORE", "x"],      // Variable names as strings
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)

Operand Types in Array Format

Atoms (number | string | boolean | null): Constants for PUSH

["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]

Variable names: String identifiers

["LOAD", "counter"]
["STORE", "result"]

Label definitions: Single-element arrays starting with . and ending with :

[".loop:"]
[".end:"]
[".function_body:"]

Label references: Strings in jump/function instructions

["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]

Counts: Numbers for array/dict construction

["MAKE_ARRAY", 3]    // Pop 3 items
["MAKE_DICT", 2]     // Pop 2 key-value pairs

Functions in Array Format

// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]

// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]

// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]

// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]

// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]

Complete Example

const factorial = toBytecode([
  ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
  ["STORE", "factorial"],
  ["JUMP", ".main"],

  [".fact:"],
  ["LOAD", "n"],
  ["PUSH", 0],
  ["LTE"],
  ["JUMP_IF_FALSE", ".recurse"],
  ["LOAD", "acc"],
  ["RETURN"],

  [".recurse:"],
  ["LOAD", "factorial"],
  ["LOAD", "n"],
  ["PUSH", 1],
  ["SUB"],
  ["LOAD", "n"],
  ["LOAD", "acc"],
  ["MUL"],
  ["PUSH", 2],
  ["PUSH", 0],
  ["TAIL_CALL"],

  [".main:"],
  ["LOAD", "factorial"],
  ["PUSH", 5],
  ["PUSH", 1],
  ["PUSH", 0],
  ["CALL"],
  ["HALT"]
])

const result = await run(factorial)  // { type: "number", value: 120 }

String Format

Functions

MAKE_FUNCTION (x y) .body       ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body  ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both

Function Calls

Stack order (bottom to top):

LOAD fn
PUSH arg1           ; Positional args
PUSH arg2
PUSH "name"         ; Named arg key
PUSH "value"        ; Named arg value
PUSH 2              ; Positional count
PUSH 1              ; Named count
CALL

Opcodes

Stack

PUSH <const> - Push constant
POP - Remove top
DUP - Duplicate top

Variables

LOAD <name> - Push variable value (throws if not found)
TRY_LOAD <name> - Push variable value if found, otherwise push name as string (never throws)
STORE <name> - Pop and store in variable

Arithmetic

ADD, SUB, MUL, DIV, MOD - Binary ops (pop 2, push result)

Comparison

EQ, NEQ, LT, GT, LTE, GTE - Pop 2, push boolean

Logic

NOT - Pop 1, push !value

Control Flow

JUMP .label - Unconditional jump
JUMP_IF_FALSE .label - Jump if top is false or null (pops value)
JUMP_IF_TRUE .label - Jump if top is truthy (pops value)
HALT - Stop execution of the program

Functions

MAKE_FUNCTION (params) .body - Create function, push to stack
CALL - Call function (see calling convention above)
TAIL_CALL - Tail-recursive call (no stack growth)
RETURN - Return from function (pops return value)
TRY_CALL <name> - Call function (if found), push value (if exists), or push name as string (if not found)
BREAK - Exit iterator/loop (unwinds to break target)

Arrays

MAKE_ARRAY #N - Pop N items, push array
ARRAY_GET - Pop index and array, push element
ARRAY_SET - Pop value, index, array; mutate array
ARRAY_PUSH - Pop value and array, append to array
ARRAY_LEN - Pop array, push length

Dicts

MAKE_DICT #N - Pop N key-value pairs, push dict
DICT_GET - Pop key and dict, push value (or null)
DICT_SET - Pop value, key, dict; mutate dict
DICT_HAS - Pop key and dict, push boolean

Unified Access

DOT_GET - Pop index/key and array/dict, push value (null if missing)

Strings

STR_CONCAT #N - Pop N values, convert to strings, concatenate, push result

Exceptions

PUSH_TRY .catch - Register exception handler
PUSH_FINALLY .finally - Add finally to current handler
POP_TRY - Remove handler (try succeeded)
THROW - Throw exception (pops error value)

Compiler Patterns

If-Else

<condition>
JUMP_IF_FALSE .else
  <then-block>
  JUMP .end
.else:
  <else-block>
.end:

While Loop

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  JUMP .loop
.end:

For Loop

<init>
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  <increment>
  JUMP .loop
.end:

Continue

No CONTINUE opcode. Use backward jump to loop start:

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <early-check>
  JUMP_IF_TRUE .loop    ; continue
  <body>
  JUMP .loop
.end:

Break in Loop

Mark iterator function as break target, use BREAK opcode:

MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT

.each_body:
  <condition>
  JUMP_IF_TRUE .done
  <body>
  BREAK                  ; exits to caller
.done:
  RETURN

Short-Circuit AND

<left>
DUP
JUMP_IF_FALSE .end      ; Short-circuit if false
POP
<right>
.end:                    ; Result on stack

Short-Circuit OR

<left>
DUP
JUMP_IF_TRUE .end       ; Short-circuit if true
POP
<right>
.end:                    ; Result on stack

Try-Catch

PUSH_TRY .catch
  <try-block>
  POP_TRY
  JUMP .end
.catch:
  STORE err
  <catch-block>
.end:

Try-Catch-Finally

PUSH_TRY .catch
PUSH_FINALLY .finally
  <try-block>
  POP_TRY
  JUMP .finally         ; Compiler must generate this
.catch:
  STORE err
  <catch-block>
  JUMP .finally         ; And this
.finally:
  <finally-block>       ; Executes in both paths
.end:

Important: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.

Closures

Functions automatically capture current scope:

PUSH 0
STORE counter
MAKE_FUNCTION () .increment
RETURN

.increment:
  LOAD counter          ; Captured variable
  PUSH 1
  ADD
  STORE counter
  LOAD counter
  RETURN

Tail Recursion

Use TAIL_CALL instead of CALL for last call:

MAKE_FUNCTION (n acc) .factorial
STORE factorial
<...>

.factorial:
  LOAD n
  PUSH 0
  LTE
  JUMP_IF_FALSE .recurse
  LOAD acc
  RETURN
.recurse:
  LOAD factorial
  LOAD n
  PUSH 1
  SUB
  LOAD n
  LOAD acc
  MUL
  PUSH 2
  PUSH 0
  TAIL_CALL             ; Reuses stack frame

Optional Function Calls (TRY_CALL)

Call function if defined, otherwise use value or name as string:

; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit

; Later: call if defined, skip if not
TRY_CALL onInit       ; Calls onInit() if it's a function
                       ; Pushes value if it exists but isn't a function
                       ; Pushes "onInit" as string if undefined

; Use with values
PUSH 42
STORE answer
TRY_CALL answer        ; Pushes 42 (not a function)

; Use with undefined
TRY_CALL unknown       ; Pushes "unknown" as string

Use Cases:

Optional hooks/callbacks in DSLs
Shell-like languages where unknown identifiers become strings
Templating systems with optional transformers

String Concatenation

Build strings from multiple values:

; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3           ; → "Hello World"

; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2           ; → "Name: Alice"

; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2           ; → "Result: 15"

; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5           ; → "User 42 has 3 items"

Composability: Results can be concatenated again

PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2           ; → "Hello World!"

Unified Access (DOT_GET)

DOT_GET provides a single opcode for accessing both arrays and dicts:

; Array access
PUSH 10
PUSH 20
PUSH 30
MAKE_ARRAY #3
PUSH 1
DOT_GET                 ; → 20

; Dict access
PUSH 'name'
PUSH 'Alice'
MAKE_DICT #1
PUSH 'name'
DOT_GET                 ; → 'Alice'

Chained access:

; Access dict['users'][0]['name']
LOAD dict
PUSH 'users'
DOT_GET                 ; Get users array
PUSH 0
DOT_GET                 ; Get first user
PUSH 'name'
DOT_GET                 ; Get name field

With variables:

LOAD data
LOAD key                ; Key can be string or number
DOT_GET                 ; Works for both array and dict

Null safety: Returns null for missing keys or out-of-bounds indices

MAKE_ARRAY #0
PUSH 0
DOT_GET                 ; → null (empty array)

MAKE_DICT #0
PUSH 'key'
DOT_GET                 ; → null (missing key)

Key Concepts

Truthiness

Only null and false are falsy. Everything else (including 0, "", empty arrays/dicts) is truthy.

Type Coercion

toNumber:

number → identity
string → parseFloat (or 0 if invalid)
boolean → 1 (true) or 0 (false)
null → 0
Others → 0

toString:

string → identity
number → string representation
boolean → "true" or "false"
null → "null"
function → ""
array → "[item, item]"
dict → "{key: value, ...}"

Arithmetic ops (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.

Comparison ops (LT, GT, LTE, GTE) coerce both operands to numbers.

Equality ops (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.

Note: There is no string concatenation operator. ADD only works with numbers.

Scope

Variables resolved through parent scope chain
STORE updates existing variable or creates in current scope
Functions capture scope at definition time

Identifiers

Variable and function parameter names support Unicode and emoji:

Valid: 💎, 🌟, 変数, counter, _private
Invalid: Cannot start with digits, ., #, @, or ...
Invalid: Cannot contain whitespace or special chars: ;, (), [], {}, =, ', "

Break Semantics

CALL marks current frame as break target
BREAK unwinds call stack to that target
Used for Ruby-style iterator pattern

Parameter Binding Priority

For function calls, parameters bound in order:

Positional argument (if provided)
Named argument (if provided and matches param name)
Default value (if defined)
Null

Exception Handlers

PUSH_TRY uses absolute addresses for catch blocks
Nested try blocks form a stack
THROW unwinds to most recent handler and jumps to finally (if present) or catch
VM does NOT automatically jump to finally on success - compiler must generate JUMPs
Finally execution in all cases is compiler's responsibility, not VM's

Calling Convention

All calls (including native functions) push arguments in order:

Function
Positional args (in order)
Named args (key1, val1, key2, val2, ...)
Positional count (as number)
Named count (as number)
CALL or TAIL_CALL

Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL.

Registering Native Functions

Native TypeScript functions are registered into the VM's scope and accessed like regular variables.

Method 1: Pass to run() or VM constructor

const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or with VM
const vm = new VM(bytecode, { add, greet })

Method 2: Register after construction

const vm = new VM(bytecode)
vm.registerFunction('add', (a: number, b: number) => a + b)
await vm.run()

Method 3: Value-based functions (for full control)

vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
  return { type: 'number', value: toNumber(a) + toNumber(b) }
})

Auto-wrapping: registerFunction automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work.

Usage in bytecode:

; Positional arguments
LOAD add              ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2                ; positionalCount
PUSH 0                ; namedCount
CALL                  ; Call like any other function

; Named arguments
LOAD greet
PUSH "name"
PUSH "Alice"
PUSH "greeting"
PUSH "Hi"
PUSH 0                ; positionalCount
PUSH 2                ; namedCount
CALL                  ; → "Hi, Alice!"

Named Arguments: Native functions support named arguments. Parameter names are extracted from the function signature at call time, and arguments are bound using the same priority as Reef functions (named arg > positional arg > default > null).

@named Pattern: Parameters starting with at followed by an uppercase letter (e.g., atOptions, atNamed) collect unmatched named arguments:

// Basic @named - collects all named args
vm.registerFunction('greet', (atNamed: any = {}) => {
  return `Hello, ${atNamed.name || 'World'}!`
})

// Mixed positional and @named
vm.registerFunction('configure', (name: string, atOptions: any = {}) => {
  return {
    name,
    debug: atOptions.debug || false,
    port: atOptions.port || 3000
  }
})

Bytecode example:

; Call with mixed positional and named args
LOAD configure
PUSH "myApp"        ; positional arg → name
PUSH "debug"
PUSH true
PUSH "port"
PUSH 8080
PUSH 1              ; 1 positional arg
PUSH 2              ; 2 named args (debug, port)
CALL                ; atOptions receives {debug: true, port: 8080}

Named arguments that match fixed parameter names are bound to those parameters. Remaining unmatched named arguments are collected into the atXxx parameter as a plain JavaScript object.

Calling Functions from TypeScript

You can call both Reef and native functions from TypeScript using vm.call():

const bytecode = toBytecode(`
  MAKE_FUNCTION (name greeting="Hello") .greet
  STORE greet
  HALT

  .greet:
    LOAD greeting
    PUSH " "
    LOAD name
    PUSH "!"
    STR_CONCAT #4
    RETURN
`)

const vm = new VM(bytecode, {
  log: (msg: string) => console.log(msg)  // Native function
})
await vm.run()

// Call Reef function with positional arguments
const result1 = await vm.call('greet', 'Alice')
// Returns: "Hello Alice!"

// Call Reef function with named arguments (pass as final object)
const result2 = await vm.call('greet', 'Bob', { greeting: 'Hi' })
// Returns: "Hi Bob!"

// Call Reef function with only named arguments
const result3 = await vm.call('greet', { name: 'Carol', greeting: 'Hey' })
// Returns: "Hey Carol!"

// Call native function
await vm.call('log', 'Hello from TypeScript!')

How it works:

vm.call(functionName, ...args) looks up the function (Reef or native) in the VM's scope
For Reef functions: converts to callable JavaScript function
For native functions: calls directly
Arguments are automatically converted to ReefVM Values
Returns the result (automatically converted back to JavaScript types)

Named arguments: Pass a plain object as the final argument to provide named arguments. If the last argument is a non-array object, it's treated as named arguments. All preceding arguments are treated as positional.

Type conversion: Arguments and return values are automatically converted between JavaScript types and ReefVM Values:

Primitives: number, string, boolean, null
Arrays: converted recursively
Objects: converted to ReefVM dicts
Functions: Reef functions are converted to callable JavaScript functions

REPL Mode (Incremental Compilation)

ReefVM supports incremental bytecode execution for building REPLs. This allows you to execute code line-by-line while preserving scope and avoiding re-execution of side effects.

The Problem: By default, vm.run() resets the program counter (PC) to 0, re-executing all previous bytecode. This makes it impossible to implement a REPL where each line executes only once.

The Solution: Use vm.continue() to resume execution from where you left off:

// Line 1: Define variable
const line1 = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"]
])

const vm = new VM(line1)
await vm.run()  // Execute first line

// Line 2: Use the variable
const line2 = toBytecode([
  ["LOAD", "x"],
  ["PUSH", 10],
  ["ADD"]
])

vm.appendBytecode(line2)  // Append new bytecode with proper constant remapping
await vm.continue()       // Execute ONLY the new bytecode

// Result: 52 (42 + 10)
// The first line never re-executed!

Key methods:

vm.run(): Resets PC to 0 and runs from the beginning (normal execution)
vm.continue(): Continues from current PC (REPL mode)
vm.appendBytecode(bytecode): Helper that properly appends bytecode with constant index remapping

Important: Don't use HALT in REPL mode! The VM naturally stops when it runs out of instructions. Using HALT sets vm.stopped = true, which prevents continue() from resuming.

Example REPL pattern:

const vm = new VM(toBytecode([]), { /* native functions */ })

while (true) {
  const input = await getUserInput()  // Get next line from user
  const bytecode = compileLine(input)  // Compile to bytecode (no HALT!)

  vm.appendBytecode(bytecode)  // Append to VM
  const result = await vm.continue()  // Execute only the new code

  console.log(fromValue(result))  // Show result to user
}

This pattern ensures:

Variables persist between lines
Side effects (like echo or function calls) only run once
Previous bytecode never re-executes
Scope accumulates across all lines

Empty Stack

RETURN with empty stack returns null
HALT with empty stack returns null

19 KiB Raw Blame History

Reef Compiler Guide

Bytecode Formats

Bytecode Syntax

Instructions

Operand Types

Array Format

Operand Types in Array Format

Functions in Array Format

Complete Example

String Format

Functions

Function Calls

Opcodes

Stack

Variables

Arithmetic

Comparison

Logic

Control Flow

Functions

Arrays

Dicts

Unified Access

Strings

Exceptions

Compiler Patterns

If-Else

While Loop

For Loop

Continue

Break in Loop

Short-Circuit AND

Short-Circuit OR

Try-Catch

Try-Catch-Finally

Closures

Tail Recursion

Optional Function Calls (TRY_CALL)

String Concatenation

Unified Access (DOT_GET)

Key Concepts

Truthiness

Type Coercion

Scope

Identifiers

Break Semantics

Parameter Binding Priority

Exception Handlers

Calling Convention

Registering Native Functions

Calling Functions from TypeScript

REPL Mode (Incremental Compilation)

Empty Stack

19 KiB

Raw Blame History