ReefVM/SPEC.md

31 KiB

ReefVM Specification

Version 1.0

Overview

The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.

Architecture

Components

  • Value Stack: Operand stack for computation
  • Call Stack: Call frames for function invocations
  • Exception Handlers: Stack of try/catch handlers
  • Scope Chain: Linked scopes for lexical variable resolution (includes native functions)
  • Program Counter (PC): Current instruction index
  • Constants Pool: Immutable values and function metadata

Execution Model

  1. VM loads bytecode with instructions and constants
  2. PC starts at instruction 0
  3. Each instruction is executed sequentially (unless jumps occur)
  4. Execution continues until HALT or end of instructions
  5. Final value is top of stack (or null if empty)

Value Types

All runtime values are tagged unions:

type Value =
  | { type: 'null', value: null }
  | { type: 'boolean', value: boolean }
  | { type: 'number', value: number }
  | { type: 'string', value: string }
  | { type: 'array', value: Value[] }
  | { type: 'dict', value: Map<string, Value> }
  | { type: 'function', params: string[], defaults: Record<string, number>,
      body: number, parentScope: Scope, variadic: boolean, named: boolean }
  | { type: 'native', fn: NativeFunction, value: '<function>' }

Type Coercion

toNumber: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0

toString: string → identity, number → string, boolean → string, null → "null", function → "", array → "[item, item]", dict → "{key: value, ...}"

isTrue: Only null and false are falsy. Everything else (including 0, "", empty arrays, empty dicts) is truthy.

Bytecode Format

type Bytecode = {
  instructions: Instruction[]
  constants: Constant[]
}

type Instruction = {
  op: OpCode
  operand?: number | string
}

type Constant =
  | Value
  | { type: 'function_def', params: string[], defaults: Record<string, number>,
      body: number, variadic: boolean, named: boolean }

Scope Chain

Variables are resolved through a linked scope chain:

class Scope {
  locals: Map<string, Value>;
  parent?: Scope;
}

Variable Resolution (LOAD):

  1. Check current scope's locals
  2. If not found, recursively check parent
  3. If not found anywhere, throw error

Variable Resolution (TRY_LOAD):

  1. Check current scope's locals
  2. If not found, recursively check parent
  3. If not found anywhere, return variable name as string (no error)

Variable Assignment (STORE):

  1. If variable exists in current scope, update it
  2. Else if variable exists in any parent scope, update it there
  3. Else create new variable in current scope

This implements "assign to outermost scope where defined" semantics.

Call Frames

type CallFrame = {
  returnAddress: number        // Where to resume after RETURN
  returnScope: Scope            // Scope to restore after RETURN
  isBreakTarget: boolean        // Can be targeted by BREAK
}

Exception Handlers

type ExceptionHandler = {
  catchAddress: number          // Where to jump on exception
  finallyAddress?: number       // Where to jump for finally block (always runs)
  callStackDepth: number        // Call stack depth when handler pushed
  scope: Scope                  // Scope to restore in catch block
}

Opcodes

Stack Operations

PUSH

Operand: Index into constants pool (number) Effect: Push constant onto stack Stack: [] → [value]

POP

Operand: None Effect: Discard top of stack Stack: [value] → []

DUP

Operand: None Effect: Duplicate top of stack Stack: [value] → [value, value]

SWAP

Operand: None Effect: Swap the top two values on the stack Stack: [value1, value2] → [value2, value1]

TYPE

Operand: None Effect: Pop value from stack, push its type as a string Stack: [value] → [typeString]

Returns the type of a value as a string.

Example:

PUSH 42
TYPE              ; Pushes "number"

Variable Operations

LOAD

Operand: Variable name (string) Effect: Push variable value onto stack Stack: [] → [value] Errors: Throws if variable not found in scope chain

STORE

Operand: Variable name (string) Effect: Store top of stack into variable (following scope chain rules) Stack: [value] → []

TRY_LOAD

Operand: Variable name (string) Effect: Push variable value onto stack if found, otherwise push variable name as string Stack: [] → [value | name] Errors: Never throws (unlike LOAD)

Behavior:

  1. Search for variable in scope chain (current scope and all parents)
  2. If found, push the variable's value onto stack
  3. If not found, push the variable name as a string value onto stack

Use Cases:

  • Shell-like behavior where strings don't need quotes

Example:

PUSH 42
STORE x
TRY_LOAD x          ; Pushes 42 (variable exists)
TRY_LOAD y          ; Pushes "y" (variable doesn't exist)

Arithmetic Operations

All arithmetic operations pop two values, perform operation, push result as number.

ADD

Stack: [a, b] → [a + b]

Performs different operations depending on operand types:

  • If either operand is a string, converts both to strings and concatenates
  • Else if both operands are arrays, concatenates the arrays
  • Else if both operands are dicts, merges them (b's keys overwrite a's keys on conflict)
  • Else if both operands are numbers, performs numeric addition
  • Otherwise, throws an error

Examples:

  • 5 + 38 (numeric addition)
  • "hello" + " world""hello world" (string concatenation)
  • "count: " + 42"count: 42" (string concatenation)
  • 100 + " items""100 items" (string concatenation)
  • [1, 2, 3] + [4][1, 2, 3, 4] (array concatenation)
  • [1, 2] + [3, 4][1, 2, 3, 4] (array concatenation)
  • {a: 1} + {b: 2}{a: 1, b: 2} (dict merge)
  • {a: 1, b: 2} + {b: 99}{a: 1, b: 99} (dict merge, b overwrites)

Invalid operations (throw errors):

  • true + false → Error
  • null + 5 → Error
  • [1] + 5 → Error
  • {a: 1} + 5 → Error

SUB

Stack: [a, b] → [a - b]

MUL

Stack: [a, b] → [a * b]

DIV

Stack: [a, b] → [a / b]

MOD

Stack: [a, b] → [a % b]

Bitwise Operations

All bitwise operations coerce operands to 32-bit signed integers, perform the operation, and push the result as a number.

BIT_AND

Operand: None Stack: [a, b] → [a & b]

Performs bitwise AND operation. Both operands are coerced to 32-bit signed integers.

Example: 5 & 31 (binary: 0101 & 00110001)

BIT_OR

Operand: None Stack: [a, b] → [a | b]

Performs bitwise OR operation. Both operands are coerced to 32-bit signed integers.

Example: 5 | 37 (binary: 0101 | 00110111)

BIT_XOR

Operand: None Stack: [a, b] → [a ^ b]

Performs bitwise XOR (exclusive OR) operation. Both operands are coerced to 32-bit signed integers.

Example: 5 ^ 36 (binary: 0101 ^ 00110110)

BIT_SHL

Operand: None Stack: [a, b] → [a << b]

Performs left shift operation. Left operand is coerced to 32-bit signed integer, right operand determines shift amount (masked to 0-31).

Example: 5 << 220 (binary: 0101 shifted left 2 positions → 10100)

BIT_SHR

Operand: None Stack: [a, b] → [a >> b]

Performs sign-preserving right shift operation. Left operand is coerced to 32-bit signed integer, right operand determines shift amount (masked to 0-31). The sign bit is preserved (arithmetic shift).

Example:

  • 20 >> 25 (binary: 10100 shifted right 2 positions → 0101)
  • -20 >> 2-5 (sign bit preserved)

BIT_USHR

Operand: None Stack: [a, b] → [a >>> b]

Performs zero-fill right shift operation. Left operand is coerced to 32-bit signed integer, right operand determines shift amount (masked to 0-31). Zeros are shifted in from the left (logical shift).

Example:

  • -1 >>> 12147483647 (all bits shift right, zero fills from left)
  • -8 >>> 12147483644

Comparison Operations

All comparison operations pop two values, compare, push boolean result.

EQ

Stack: [a, b] → [boolean] Note: Type-aware equality (deep comparison for arrays/dicts)

NEQ

Stack: [a, b] → [boolean]

LT

Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)

GT

Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)

LTE

Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)

GTE

Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)

Logical Operations

NOT

Stack: [a] → [!isTrue(a)]

Note on AND/OR: There are no AND/OR opcodes. Short-circuiting logical operations are implemented at the compiler level using JUMP instructions:

AND pattern (short-circuits if left side is false):

<evaluate left>
DUP
JUMP_IF_FALSE #2    # skip POP and <evaluate right>
POP
<evaluate right>
end:

OR pattern (short-circuits if left side is true):

<evaluate left>
DUP
JUMP_IF_TRUE #2     # skip POP and <evaluate right>
POP
<evaluate right>
end:

Control Flow

JUMP

Operand: Offset (number) Effect: Add offset to PC (relative jump) Stack: No change

JUMP_IF_FALSE

Operand: Offset (number) Effect: If top of stack is falsy, add offset to PC (relative jump) Stack: [condition] → []

JUMP_IF_TRUE

Operand: Offset (number) Effect: If top of stack is truthy, add offset to PC (relative jump) Stack: [condition] → []

BREAK

Operand: None Effect: Unwind call stack until frame with isBreakTarget = true, resume there Stack: No change Errors: Throws if no break target found

Behavior:

  1. Pop frames from call stack
  2. For each frame, restore its returnScope and returnAddress
  3. Stop when finding frame with isBreakTarget = true
  4. Resume execution at that frame's return address

Note on CONTINUE: There is no CONTINUE opcode. Compilers implement continue behavior using JUMP with negative offsets to jump back to the loop start.

Exception Handling

PUSH_TRY

Operand: Catch block offset (number) Effect: Push exception handler Stack: No change

Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address.

PUSH_FINALLY

Operand: Finally block offset (number) Effect: Add finally address to most recent exception handler Stack: No change Errors: Throws if no exception handler to modify

Adds a finally block to the current try/catch. The finally block will execute whether an exception is thrown or not.

POP_TRY

Operand: None Effect: Pop exception handler (try block completed without exception) Stack: No change Errors: Throws if no handler to pop

Behavior:

  1. Pop exception handler
  2. Continue to next instruction

Notes:

  • The VM does NOT automatically jump to finally blocks on POP_TRY
  • The compiler must explicitly generate JUMP instructions to finally blocks when the try block completes normally
  • The compiler must ensure catch blocks also jump to finally when present
  • Finally blocks should end with normal control flow (no special terminator needed)

THROW

Operand: None Effect: Throw exception with error value from stack Stack: [errorValue] → (unwound)

Behavior:

  1. Pop error value from stack
  2. If no exception handlers, throw JavaScript Error with error message
  3. Otherwise, pop most recent exception handler
  4. Unwind call stack to handler's depth
  5. Restore handler's scope
  6. Push error value back onto stack
  7. If handler has finallyAddress, jump there; otherwise jump to catchAddress

Notes:

  • When THROW jumps to finally (if present), the error value remains on stack for the finally block
  • The compiler must structure catch/finally blocks appropriately to handle the error value
  • If finally is present, the catch block is typically entered via a jump from the finally block or through explicit compiler-generated control flow

Function Operations

MAKE_FUNCTION

Operand: Index into constants pool (number) Effect: Create function value, capturing current scope Stack: [] → [function]

The constant must be a function_def with:

  • params: Parameter names
  • defaults: Map of param names to constant indices for default values
  • body: Instruction address of function body
  • variadic: If true, second-to-last param (if named is also true) or last param collects remaining positional args as array
  • named: If true, last param collects unmatched named args as dict

The created function captures currentScope as its parentScope.

CALL

Operand: None

Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]

Behavior:

  1. Pop namedCount from stack (top of stack)
  2. Pop positionalCount from stack
  3. Pop named arguments (name/value pairs) from stack
  4. Pop positional arguments from stack
  5. Pop function from stack
  6. If function is native:
    • Mark current frame (if exists) as break target
    • Call native function with positional args
    • Push return value onto stack
    • Done (skip steps 7-11)
  7. Mark current frame (if exists) as break target (isBreakTarget = true)
  8. Push new call frame with current PC and scope
  9. Create new scope with function's parentScope as parent
  10. Bind parameters:
  • For regular functions: bind params by position, then by name, then defaults, then null
  • For variadic functions: bind fixed params, collect rest into array
  • For functions with named: true: bind fixed params by position/name, collect unmatched named args into dict
  1. Set currentScope to new scope
  2. Jump to function body

Parameter Binding Priority (for fixed params):

  1. Named argument (if provided and matches param name)
  2. Positional argument (if provided)
  3. Default value (if defined)
  4. Null

Null Value Semantics:

  • Passing null as an argument explicitly triggers the default value (if one exists)
  • This allows callers to "opt-in" to defaults even when providing arguments positionally
  • If no default exists, null is bound as-is
  • This applies to both ReefVM functions and native TypeScript functions
  • Example: fn(null, 20) where fn(x=10, y) binds x=10 (default triggered), y=20

Named Args Handling:

  • Named args that match fixed parameter names are bound to those params
  • If the function has named: true, remaining named args (that don't match any fixed param) are collected into the last parameter as a dict
  • This allows flexible calling: fn(x=10, y=20, extra=30) where extra goes to the named args dict
  • Native functions support named arguments - parameter names are extracted from the function signature at call time
  • Passing null via named args also triggers defaults: fn(x=null) triggers x's default

Errors: Throws if top of stack is not a function (or native function)

TAIL_CALL

Operand: None Effect: Same as CALL, but reuses current call frame Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]

Behavior: Identical to CALL except:

  • Does NOT push a new call frame
  • Replaces currentScope instead of creating nested scope
  • Enables unbounded tail recursion without stack overflow

RETURN

Operand: None Effect: Return from function Stack: [returnValue] → (restored stack with returnValue on top)

Behavior:

  1. Pop return value (or null if stack empty)
  2. Pop call frame
  3. Restore scope from frame
  4. Set PC to frame's return address
  5. Push return value onto stack

Errors: Throws if no call frame to return from

TRY_CALL

Operand: Variable name (string) Effect: Conditionally call function or push value/string onto stack Stack: [] → [returnValue | value | name] Errors: Never throws (unlike CALL)

Behavior:

  1. Look up variable by name in scope chain
  2. If variable is a function: Call it with 0 arguments (no positional, no named) and push the returned value onto the stack.
  3. If variable exists but is not a function: Push the variable's value onto stack
  4. If variable doesn't exist: Push the variable name as a string onto stack

Use Cases:

  • DSL/templating languages with "call if callable, otherwise use as literal" semantics
  • Shell-like behavior where unknown identifiers become strings
  • Optional function hooks (call if defined, silently skip if not)

Implementation Note:

  • Uses intentional fall-through in VM switch statement from TRY_CALL to CALL case
  • When function is found, stacks are set up to match CALL's expectations exactly
  • No break target marking or frame pushing occurs when non-function value is found

Example:

MAKE_FUNCTION () .body
STORE greet
PUSH 42
STORE answer
TRY_CALL greet     ; Calls function greet(), returns its value
TRY_CALL answer    ; Pushes 42 (number value)
TRY_CALL unknown   ; Pushes "unknown" (string)

.body:
  PUSH "Hello!"
  RETURN

Array Operations

MAKE_ARRAY

Operand: Number of items (number) Effect: Create array from N stack items Stack: [item1, item2, ..., itemN] → [array]

Items are popped in reverse order (item1 is array[0]).

ARRAY_GET

Operand: None Effect: Get array element at index Stack: [array, index] → [value] Errors: Throws if not array or index out of bounds

Index is coerced to number and floored.

ARRAY_SET

Operand: None Effect: Set array element at index (mutates array) Stack: [array, index, value] → [] Errors: Throws if not array or index out of bounds

ARRAY_PUSH

Operand: None Effect: Append value to end of array (mutates array, grows by 1) Stack: [array, value] → [] Errors: Throws if not array

ARRAY_LEN

Operand: None Effect: Get array length Stack: [array] → [length] Errors: Throws if not array

Dictionary Operations

MAKE_DICT

Operand: Number of key-value pairs (number) Effect: Create dict from N key-value pairs Stack: [key1, val1, key2, val2, ...] → [dict]

Keys are coerced to strings.

DICT_GET

Operand: None Effect: Get dict value for key Stack: [dict, key] → [value]

Returns null if key not found. Key is coerced to string. Errors: Throws if not dict

DICT_SET

Operand: None Effect: Set dict value for key (mutates dict) Stack: [dict, key, value] → []

Key is coerced to string. Errors: Throws if not dict

DICT_HAS

Operand: None Effect: Check if key exists in dict Stack: [dict, key] → [boolean]

Key is coerced to string. Errors: Throws if not dict

Unified Access

DOT_GET

Operand: None Effect: Get value from array or dict Stack: [array|dict, index|key] → [value]

Behavior:

  • If target is array: coerce index to number and access array[index]
  • If target is dict: coerce key to string and access dict.get(key)
  • Returns null if index out of bounds or key not found

Errors: Throws if target is not array or dict

Use Cases:

  • Unified syntax for accessing both arrays and dicts
  • Chaining access operations: obj.users.0.name
  • Generic accessor that works with any indexable type

Example:

; Array access
PUSH 10
PUSH 20
PUSH 30
MAKE_ARRAY #3
PUSH 1
DOT_GET           ; → 20

; Dict access
PUSH 'name'
PUSH 'Alice'
MAKE_DICT #1
PUSH 'name'
DOT_GET           ; → 'Alice'

; Chained access
; dict['users'][0]
LOAD dict
PUSH 'users'
DOT_GET
PUSH 0
DOT_GET

String Operations

STR_CONCAT

Operand: Number of values to concatenate (number) Effect: Concatenate N values from stack into a single string Stack: [val1, val2, ..., valN] → [string]

Behavior:

  1. Pop N values from stack (in reverse order)
  2. Convert each value to string using toString()
  3. Concatenate all strings in order (val1 + val2 + ... + valN)
  4. Push resulting string onto stack

Type Coercion:

  • Numbers → string representation (e.g., 42"42")
  • Booleans → "true" or "false"
  • Null → "null"
  • Strings → identity
  • Arrays → "[item, item]" format
  • Dicts → "{key: value, ...}" format
  • Functions → "<function>"

Use Cases:

  • Building dynamic strings from multiple parts
  • Template string interpolation
  • String formatting with mixed types

Composability:

  • Results can be concatenated again with additional STR_CONCAT operations
  • Can leave values on stack (only consumes specified count)

Example:

PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3        ; → "Hello World"

PUSH "Count: "
PUSH 42
PUSH ", Active: "
PUSH true
STR_CONCAT #4        ; → "Count: 42, Active: true"

Edge Cases:

  • STR_CONCAT #0 produces empty string ""
  • STR_CONCAT #1 converts single value to string
  • If stack has fewer values than count, behavior depends on implementation (may use empty strings or throw)

TypeScript Interop

Native TypeScript functions are registered into the VM's scope and accessed via regular LOAD/CALL operations. They behave identically to Reef functions from the bytecode perspective.

Registration:

const vm = new VM(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or after construction:
vm.set('multiply', (a: number, b: number) => a * b)

Usage in Bytecode:

LOAD add          ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2            ; positionalCount
PUSH 0            ; namedCount
CALL              ; Call it like any other function

Native Function Types:

  1. Auto-wrapped functions (via vm.set()): Accept and return native TypeScript types (number, string, boolean, array, object, etc.). The VM automatically converts between Value types and native types.

  2. Value-based functions (via vm.setValueFunction()): Accept and return Value types directly for full control over type handling.

Auto-Wrapping Behavior:

  • Parameters: Value → native type (number, string, boolean, array, object, null, RegExp)
  • Return value: native type → Value
  • Supports sync and async functions
  • Objects convert to dicts, arrays convert to Value arrays

Named Arguments:

  • Native functions support named arguments by extracting parameter names from the function signature
  • Parameter binding follows the same priority as Reef functions: named arg > positional arg > default > null
  • TypeScript rest parameters (...args) are supported and behave like Reef variadic parameters

Examples:

// Auto-wrapped native types
vm.set('add', (a: number, b: number) => a + b)
vm.set('greet', (name: string) => `Hello, ${name}!`)
vm.set('range', (n: number) => Array.from({ length: n }, (_, i) => i))

// With defaults
vm.set('greet', (name: string, greeting = 'Hello') => {
  return `${greeting}, ${name}!`
})

// Variadic functions
vm.set('sum', (...nums: number[]) => {
  return nums.reduce((acc, n) => acc + n, 0)
})

// Value-based for custom logic
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return { type: 'number', value: toNumber(a) + toNumber(b) }
})

// Async functions
vm.set('fetchData', async (url: string) => {
  const response = await fetch(url)
  return response.json()
})

Calling with Named Arguments:

; Call with positional args
LOAD greet
PUSH "Alice"
PUSH 1
PUSH 0
CALL              ; → "Hello, Alice!"

; Call with named args
LOAD greet
PUSH "name"
PUSH "Bob"
PUSH "greeting"
PUSH "Hi"
PUSH 0
PUSH 2
CALL              ; → "Hi, Bob!"

Special

HALT

Operand: None Effect: Stop execution Stack: No change

Label Syntax

The bytecode format supports labels for improved readability:

Label Definition: .label_name: marks an instruction position Label Reference: .label_name in operands (e.g., JUMP .loop_start)

Labels are resolved to numeric offsets during parsing. The original numeric offset syntax (#N) is still supported for backwards compatibility.

Example with labels:

JUMP .skip
.middle:
  PUSH 999
  HALT
.skip:
  PUSH 42
  HALT

Equivalent with numeric offsets:

JUMP #2
PUSH 999
HALT
PUSH 42
HALT

Common Bytecode Patterns

If-Else Statement

LOAD 'x'
PUSH 5
GT
JUMP_IF_FALSE .else
  # then block
  JUMP .end
.else:
  # else block
.end:

While Loop

.loop_start:
  # condition
  JUMP_IF_FALSE .loop_end
  # body
  JUMP .loop_start
.loop_end:

Function Definition

MAKE_FUNCTION <params> .function_body
STORE 'functionName'
JUMP .skip_body
.function_body:
  # function code
  RETURN
.skip_body:

Try-Catch

PUSH_TRY .catch
  ; try block
POP_TRY
JUMP .end
.catch:
  STORE 'errorVar'   ; Error is on stack
  ; catch block
.end:

Try-Catch-Finally

PUSH_TRY .catch
PUSH_FINALLY .finally
  ; try block
POP_TRY
JUMP .finally
.catch:
  STORE 'errorVar'   ; Error is on stack
  ; catch block
  JUMP .finally
.finally:
  ; finally block (executes in both cases)
.end:

Named Function Call

LOAD 'mkdir'
PUSH 'src/bin'        # positional arg
PUSH 'recursive'      # name
PUSH true             # value
PUSH 1                # positionalCount
PUSH 1                # namedCount
CALL

Null Triggering Default Values

# Function: greet(name='Guest', greeting='Hello')
MAKE_FUNCTION (name='Guest' greeting='Hello') .greet_body
STORE 'greet'
JUMP .main
.greet_body:
  LOAD 'greeting'
  PUSH ', '
  ADD
  LOAD 'name'
  ADD
  RETURN
.main:
  # Call with null for first param - triggers default
  LOAD 'greet'
  PUSH null            # name will use default 'Guest'
  PUSH 'Hi'            # greeting='Hi' (provided)
  PUSH 2               # positionalCount
  PUSH 0               # namedCount
  CALL                 # Returns "Hi, Guest"

Tail Recursive Function

MAKE_FUNCTION (n acc) .factorial_body
STORE 'factorial'
JUMP .main
.factorial_body:
  LOAD 'n'
  PUSH 0
  EQ
  JUMP_IF_FALSE .recurse
  LOAD 'acc'
  RETURN
.recurse:
  LOAD 'factorial'
  LOAD 'n'
  PUSH 1
  SUB
  LOAD 'n'
  LOAD 'acc'
  MUL
  PUSH 2             # positionalCount
  PUSH 0             # namedCount
  TAIL_CALL          # No stack growth!
.main:
  LOAD 'factorial'
  PUSH 5
  PUSH 1
  PUSH 2             # positionalCount
  PUSH 0             # namedCount
  CALL

Error Conditions

Runtime Errors

All of these should throw errors:

  1. Undefined Variable: LOAD of non-existent variable
  2. Type Mismatch: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function
  3. Index Out of Bounds: ARRAY_GET/SET with invalid index
  4. Stack Underflow: Arithmetic ops without enough operands
  5. Uncaught Exception: THROW with no exception handlers
  6. Break Outside Loop: BREAK with no break target
  7. Continue Outside Loop: CONTINUE with no continue target
  8. Return Outside Function: RETURN with no call frame
  9. Mismatched Handler: POP_TRY with no handler
  10. Invalid Constant: PUSH with invalid constant index
  11. Invalid Function Definition: MAKE_FUNCTION with non-function_def constant

Edge Cases

Empty Stack

  • Arithmetic/comparison ops on empty stack should throw
  • RETURN with empty stack returns null
  • HALT with empty stack returns null

Null Values

  • Arithmetic with null coerces to 0
  • Comparisons with null work normally
  • Null is falsy

Scope Shadowing

  • Variables in inner scopes shadow outer scopes during LOAD
  • STORE updates outermost scope where variable is defined

Function Parameter Binding

  • Missing positional args → use named args → use defaults → use null
  • Extra positional args → collected by variadic parameter or ignored
  • Extra named args → collected by named args parameter (if named: true) or ignored
  • Named arg matching is case-sensitive

Tail Call Optimization

  • TAIL_CALL reuses frame, so return address is from original caller
  • Multiple tail calls in sequence never grow stack
  • TAIL_CALL can call different function (not just self-recursive)

Break/Continue Semantics

  • BREAK unwinds to frame that called the iterator function
  • Multiple nested function calls: break exits all of them until reaching marked frame
  • CONTINUE is implemented by the compiler using JUMPs

Exception Unwinding

  • THROW unwinds call stack to handler's depth
  • Exception handlers form a stack (nested try blocks)
  • Error value on stack is available in catch/finally blocks
  • When THROW occurs and handler has finallyAddress, VM jumps to finally first
  • Compiler is responsible for structuring control flow so finally executes in all cases
  • Finally typically executes after try (if no exception) or after catch (if exception), but control flow is compiler-managed

VM Initialization

// Register native functions during construction
const vm = new VM(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or register after construction
vm.set('multiply', (a: number, b: number) => a * b)

// Or use Value-based functions
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return { type: 'number', value: toNumber(a) + toNumber(b) }
})

const result = await vm.run()

Testing Considerations

Unit Tests Should Cover

  1. Each opcode individually with minimal setup
  2. Type coercion for arithmetic, comparison, and logical ops
  3. Scope chain resolution (local, parent, global)
  4. Call frames (nested calls, return values)
  5. Exception handling (nested try blocks, unwinding, finally blocks)
  6. Break/continue (nested functions, iterator pattern)
  7. Closures (capturing variables, multiple nesting levels)
  8. Tail calls (self-recursive, mutual recursion)
  9. Parameter binding (positional, named, defaults, variadic, named args collection, combinations)
  10. Array/dict operations (creation, access, mutation)
  11. Error conditions (all error cases listed above)
  12. Edge cases (empty stack, null values, shadowing, etc.)

Integration Tests Should Cover

  1. Recursive functions (factorial, fibonacci)
  2. Iterator pattern (each with break)
  3. Closure examples (counters, adder factories)
  4. Exception examples (try/catch/throw chains)
  5. Complex scope (deeply nested functions)
  6. Mixed features (variadic + defaults + named args)

Property-Based Tests Should Cover

  1. Stack integrity (stack size matches expectations after ops)
  2. Scope integrity (variables remain accessible)
  3. Frame integrity (call stack unwinds correctly)

Version History

  • 1.0 (2024): Initial specification

Notes

  • PC increment happens after each instruction execution
  • Jump instructions use relative offsets (added to current PC after increment)
  • All async operations (native functions) must be awaited
  • Arrays and dicts are mutable (pass by reference)
  • Functions are immutable values
  • The VM is single-threaded (no concurrency primitives)