ReefVM/GUIDE.md

13 KiB

Reef Compiler Guide

Quick reference for compiling to Reef bytecode.

Bytecode Formats

ReefVM supports two bytecode formats:

  1. String format: Human-readable text with opcodes and operands
  2. Array format: TypeScript arrays with typed tuples for programmatic generation

Both formats are compiled using the same toBytecode() function.

Bytecode Syntax

Instructions

OPCODE operand     ; comment

Operand Types

Immediate numbers (#N): Counts or relative offsets

  • MAKE_ARRAY #3 - count of 3 items
  • JUMP #5 - relative offset of 5 instructions (prefer labels)
  • PUSH_TRY #10 - absolute instruction index (prefer labels)

Labels (.name): Symbolic addresses resolved at parse time

  • .label: - define label at current position
  • JUMP .loop - jump to label
  • MAKE_FUNCTION (x) .body - function body at label

Variable names: Plain identifiers (supports Unicode and emoji!)

  • LOAD counter - load variable
  • STORE result - store variable
  • LOAD 💎 - load emoji variable
  • STORE 変数 - store Unicode variable

Constants: Literals added to constants pool

  • Numbers: PUSH 42, PUSH 3.14
  • Strings: PUSH "hello" or PUSH 'world'
  • Booleans: PUSH true, PUSH false
  • Null: PUSH null

Native function names: Registered TypeScript functions

  • CALL_NATIVE print

Array Format

The programmatic array format uses TypeScript tuples for type safety:

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],        // Atom values: number | string | boolean | null
  ["STORE", "x"],      // Variable names as strings
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)

Operand Types in Array Format

Atoms (number | string | boolean | null): Constants for PUSH

["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]

Variable names: String identifiers

["LOAD", "counter"]
["STORE", "result"]

Label definitions: Single-element arrays starting with . and ending with :

[".loop:"]
[".end:"]
[".function_body:"]

Label references: Strings in jump/function instructions

["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]

Counts: Numbers for array/dict construction

["MAKE_ARRAY", 3]    // Pop 3 items
["MAKE_DICT", 2]     // Pop 2 key-value pairs

Native function names: Strings for registered functions

["CALL_NATIVE", "print"]

Functions in Array Format

// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]

// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]

// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]

// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]

// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]

Complete Example

const factorial = toBytecode([
  ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
  ["STORE", "factorial"],
  ["JUMP", ".main"],

  [".fact:"],
  ["LOAD", "n"],
  ["PUSH", 0],
  ["LTE"],
  ["JUMP_IF_FALSE", ".recurse"],
  ["LOAD", "acc"],
  ["RETURN"],

  [".recurse:"],
  ["LOAD", "factorial"],
  ["LOAD", "n"],
  ["PUSH", 1],
  ["SUB"],
  ["LOAD", "n"],
  ["LOAD", "acc"],
  ["MUL"],
  ["PUSH", 2],
  ["PUSH", 0],
  ["TAIL_CALL"],

  [".main:"],
  ["LOAD", "factorial"],
  ["PUSH", 5],
  ["PUSH", 1],
  ["PUSH", 0],
  ["CALL"],
  ["HALT"]
])

const result = await run(factorial)  // { type: "number", value: 120 }

String Format

Functions

MAKE_FUNCTION (x y) .body       ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body  ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both

Function Calls

Stack order (bottom to top):

LOAD fn
PUSH arg1           ; Positional args
PUSH arg2
PUSH "name"         ; Named arg key
PUSH "value"        ; Named arg value
PUSH 2              ; Positional count
PUSH 1              ; Named count
CALL

Opcodes

Stack

  • PUSH <const> - Push constant
  • POP - Remove top
  • DUP - Duplicate top

Variables

  • LOAD <name> - Push variable value (throws if not found)
  • TRY_LOAD <name> - Push variable value if found, otherwise push name as string (never throws)
  • STORE <name> - Pop and store in variable

Arithmetic

  • ADD, SUB, MUL, DIV, MOD - Binary ops (pop 2, push result)

Comparison

  • EQ, NEQ, LT, GT, LTE, GTE - Pop 2, push boolean

Logic

  • NOT - Pop 1, push !value

Control Flow

  • JUMP .label - Unconditional jump
  • JUMP_IF_FALSE .label - Jump if top is false or null (pops value)
  • JUMP_IF_TRUE .label - Jump if top is truthy (pops value)
  • HALT - Stop execution of the program

Functions

  • MAKE_FUNCTION (params) .body - Create function, push to stack
  • CALL - Call function (see calling convention above)
  • TAIL_CALL - Tail-recursive call (no stack growth)
  • RETURN - Return from function (pops return value)
  • TRY_CALL <name> - Call function (if found), push value (if exists), or push name as string (if not found)
  • BREAK - Exit iterator/loop (unwinds to break target)

Arrays

  • MAKE_ARRAY #N - Pop N items, push array
  • ARRAY_GET - Pop index and array, push element
  • ARRAY_SET - Pop value, index, array; mutate array
  • ARRAY_PUSH - Pop value and array, append to array
  • ARRAY_LEN - Pop array, push length

Dicts

  • MAKE_DICT #N - Pop N key-value pairs, push dict
  • DICT_GET - Pop key and dict, push value (or null)
  • DICT_SET - Pop value, key, dict; mutate dict
  • DICT_HAS - Pop key and dict, push boolean

Strings

  • STR_CONCAT #N - Pop N values, convert to strings, concatenate, push result

Exceptions

  • PUSH_TRY .catch - Register exception handler
  • PUSH_FINALLY .finally - Add finally to current handler
  • POP_TRY - Remove handler (try succeeded)
  • THROW - Throw exception (pops error value)

Native

  • CALL_NATIVE <name> - Call registered TypeScript function (consumes entire stack as args)

Compiler Patterns

If-Else

<condition>
JUMP_IF_FALSE .else
  <then-block>
  JUMP .end
.else:
  <else-block>
.end:

While Loop

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  JUMP .loop
.end:

For Loop

<init>
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  <increment>
  JUMP .loop
.end:

Continue

No CONTINUE opcode. Use backward jump to loop start:

.loop:
  <condition>
  JUMP_IF_FALSE .end
  <early-check>
  JUMP_IF_TRUE .loop    ; continue
  <body>
  JUMP .loop
.end:

Break in Loop

Mark iterator function as break target, use BREAK opcode:

MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT

.each_body:
  <condition>
  JUMP_IF_TRUE .done
  <body>
  BREAK                  ; exits to caller
.done:
  RETURN

Short-Circuit AND

<left>
DUP
JUMP_IF_FALSE .end      ; Short-circuit if false
POP
<right>
.end:                    ; Result on stack

Short-Circuit OR

<left>
DUP
JUMP_IF_TRUE .end       ; Short-circuit if true
POP
<right>
.end:                    ; Result on stack

Try-Catch

PUSH_TRY .catch
  <try-block>
  POP_TRY
  JUMP .end
.catch:
  STORE err
  <catch-block>
.end:

Try-Catch-Finally

PUSH_TRY .catch
PUSH_FINALLY .finally
  <try-block>
  POP_TRY
  JUMP .finally         ; Compiler must generate this
.catch:
  STORE err
  <catch-block>
  JUMP .finally         ; And this
.finally:
  <finally-block>       ; Executes in both paths
.end:

Important: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.

Closures

Functions automatically capture current scope:

PUSH 0
STORE counter
MAKE_FUNCTION () .increment
RETURN

.increment:
  LOAD counter          ; Captured variable
  PUSH 1
  ADD
  STORE counter
  LOAD counter
  RETURN

Tail Recursion

Use TAIL_CALL instead of CALL for last call:

MAKE_FUNCTION (n acc) .factorial
STORE factorial
<...>

.factorial:
  LOAD n
  PUSH 0
  LTE
  JUMP_IF_FALSE .recurse
  LOAD acc
  RETURN
.recurse:
  LOAD factorial
  LOAD n
  PUSH 1
  SUB
  LOAD n
  LOAD acc
  MUL
  PUSH 2
  PUSH 0
  TAIL_CALL             ; Reuses stack frame

Optional Function Calls (TRY_CALL)

Call function if defined, otherwise use value or name as string:

; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit

; Later: call if defined, skip if not
TRY_CALL onInit       ; Calls onInit() if it's a function
                       ; Pushes value if it exists but isn't a function
                       ; Pushes "onInit" as string if undefined

; Use with values
PUSH 42
STORE answer
TRY_CALL answer        ; Pushes 42 (not a function)

; Use with undefined
TRY_CALL unknown       ; Pushes "unknown" as string

Use Cases:

  • Optional hooks/callbacks in DSLs
  • Shell-like languages where unknown identifiers become strings
  • Templating systems with optional transformers

String Concatenation

Build strings from multiple values:

; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3           ; → "Hello World"

; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2           ; → "Name: Alice"

; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2           ; → "Result: 15"

; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5           ; → "User 42 has 3 items"

Composability: Results can be concatenated again

PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2           ; → "Hello World!"

Key Concepts

Truthiness

Only null and false are falsy. Everything else (including 0, "", empty arrays/dicts) is truthy.

Type Coercion

toNumber:

  • number → identity
  • string → parseFloat (or 0 if invalid)
  • boolean → 1 (true) or 0 (false)
  • null → 0
  • Others → 0

toString:

  • string → identity
  • number → string representation
  • boolean → "true" or "false"
  • null → "null"
  • function → ""
  • array → "[item, item]"
  • dict → "{key: value, ...}"

Arithmetic ops (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.

Comparison ops (LT, GT, LTE, GTE) coerce both operands to numbers.

Equality ops (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.

Note: There is no string concatenation operator. ADD only works with numbers.

Scope

  • Variables resolved through parent scope chain
  • STORE updates existing variable or creates in current scope
  • Functions capture scope at definition time

Identifiers

Variable and function parameter names support Unicode and emoji:

  • Valid: 💎, 🌟, 変数, counter, _private
  • Invalid: Cannot start with digits, ., #, @, or ...
  • Invalid: Cannot contain whitespace or special chars: ;, (), [], {}, =, ', "

Break Semantics

  • CALL marks current frame as break target
  • BREAK unwinds call stack to that target
  • Used for Ruby-style iterator pattern

Parameter Binding Priority

For function calls, parameters bound in order:

  1. Positional argument (if provided)
  2. Named argument (if provided and matches param name)
  3. Default value (if defined)
  4. Null

Exception Handlers

  • PUSH_TRY uses absolute addresses for catch blocks
  • Nested try blocks form a stack
  • THROW unwinds to most recent handler and jumps to finally (if present) or catch
  • VM does NOT automatically jump to finally on success - compiler must generate JUMPs
  • Finally execution in all cases is compiler's responsibility, not VM's

Calling Convention

All calls push arguments in order:

  1. Function
  2. Positional args (in order)
  3. Named args (key1, val1, key2, val2, ...)
  4. Positional count (as number)
  5. Named count (as number)
  6. CALL or TAIL_CALL

CALL_NATIVE Behavior

Unlike CALL, CALL_NATIVE consumes the entire stack as arguments and clears the stack. The native function receives all values that were on the stack at the time of the call.

Registering Native Functions

Method 1: Pass to run() or VM constructor

const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or with VM
const vm = new VM(bytecode, { add, greet })

Method 2: Register manually

const vm = new VM(bytecode)
vm.registerFunction('add', (a, b) => a + b)
await vm.run()

Functions are auto-wrapped to convert between native TypeScript and ReefVM Value types. Both sync and async functions work.

Empty Stack

  • RETURN with empty stack returns null
  • HALT with empty stack returns null