22 KiB
ReefVM Specification
Version 1.0
Overview
The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.
Architecture
Components
- Value Stack: Operand stack for computation
- Call Stack: Call frames for function invocations
- Exception Handlers: Stack of try/catch handlers
- Scope Chain: Linked scopes for lexical variable resolution
- Program Counter (PC): Current instruction index
- Constants Pool: Immutable values and function metadata
- Native Function Registry: External functions callable from Shrimp
Execution Model
- VM loads bytecode with instructions and constants
- PC starts at instruction 0
- Each instruction is executed sequentially (unless jumps occur)
- Execution continues until HALT or end of instructions
- Final value is top of stack (or null if empty)
Value Types
All runtime values are tagged unions:
type Value =
| { type: 'null', value: null }
| { type: 'boolean', value: boolean }
| { type: 'number', value: number }
| { type: 'string', value: string }
| { type: 'array', value: Value[] }
| { type: 'dict', value: Map<string, Value> }
| { type: 'function', params: string[], defaults: Record<string, number>,
body: number, parentScope: Scope, variadic: boolean, named: boolean }
Type Coercion
toNumber: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0
toString: string → identity, number → string, boolean → string, null → "null", function → "", array → "[item, item]", dict → "{key: value, ...}"
isTrue: Only null and false are falsy. Everything else (including 0, "", empty arrays, empty dicts) is truthy.
Bytecode Format
type Bytecode = {
instructions: Instruction[]
constants: Constant[]
}
type Instruction = {
op: OpCode
operand?: number | string
}
type Constant =
| Value
| { type: 'function_def', params: string[], defaults: Record<string, number>,
body: number, variadic: boolean, named: boolean }
Scope Chain
Variables are resolved through a linked scope chain:
class Scope {
locals: Map<string, Value>;
parent?: Scope;
}
Variable Resolution (LOAD):
- Check current scope's locals
- If not found, recursively check parent
- If not found anywhere, throw error
Variable Resolution (TRY_LOAD):
- Check current scope's locals
- If not found, recursively check parent
- If not found anywhere, return variable name as string (no error)
Variable Assignment (STORE):
- If variable exists in current scope, update it
- Else if variable exists in any parent scope, update it there
- Else create new variable in current scope
This implements "assign to outermost scope where defined" semantics.
Call Frames
type CallFrame = {
returnAddress: number // Where to resume after RETURN
returnScope: Scope // Scope to restore after RETURN
isBreakTarget: boolean // Can be targeted by BREAK
}
Exception Handlers
type ExceptionHandler = {
catchAddress: number // Where to jump on exception
finallyAddress?: number // Where to jump for finally block (always runs)
callStackDepth: number // Call stack depth when handler pushed
scope: Scope // Scope to restore in catch block
}
Opcodes
Stack Operations
PUSH
Operand: Index into constants pool (number) Effect: Push constant onto stack Stack: [] → [value]
POP
Operand: None Effect: Discard top of stack Stack: [value] → []
DUP
Operand: None Effect: Duplicate top of stack Stack: [value] → [value, value]
Variable Operations
LOAD
Operand: Variable name (string) Effect: Push variable value onto stack Stack: [] → [value] Errors: Throws if variable not found in scope chain
STORE
Operand: Variable name (string) Effect: Store top of stack into variable (following scope chain rules) Stack: [value] → []
TRY_LOAD
Operand: Variable name (string) Effect: Push variable value onto stack if found, otherwise push variable name as string Stack: [] → [value | name] Errors: Never throws (unlike LOAD)
Behavior:
- Search for variable in scope chain (current scope and all parents)
- If found, push the variable's value onto stack
- If not found, push the variable name as a string value onto stack
Use Cases:
- Shell-like behavior where strings don't need quotes
Example:
PUSH 42
STORE x
TRY_LOAD x ; Pushes 42 (variable exists)
TRY_LOAD y ; Pushes "y" (variable doesn't exist)
Arithmetic Operations
All arithmetic operations pop two values, perform operation, push result as number.
ADD
Stack: [a, b] → [a + b] Note: Only for numbers (use separate string concat if needed)
SUB
Stack: [a, b] → [a - b]
MUL
Stack: [a, b] → [a * b]
DIV
Stack: [a, b] → [a / b]
MOD
Stack: [a, b] → [a % b]
Comparison Operations
All comparison operations pop two values, compare, push boolean result.
EQ
Stack: [a, b] → [boolean] Note: Type-aware equality (deep comparison for arrays/dicts)
NEQ
Stack: [a, b] → [boolean]
LT
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
GT
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
LTE
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
GTE
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
Logical Operations
NOT
Stack: [a] → [!isTrue(a)]
Note on AND/OR: There are no AND/OR opcodes. Short-circuiting logical operations are implemented at the compiler level using JUMP instructions:
AND pattern (short-circuits if left side is false):
<evaluate left>
DUP
JUMP_IF_FALSE #2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
OR pattern (short-circuits if left side is true):
<evaluate left>
DUP
JUMP_IF_TRUE #2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
Control Flow
JUMP
Operand: Offset (number) Effect: Add offset to PC (relative jump) Stack: No change
JUMP_IF_FALSE
Operand: Offset (number) Effect: If top of stack is falsy, add offset to PC (relative jump) Stack: [condition] → []
JUMP_IF_TRUE
Operand: Offset (number) Effect: If top of stack is truthy, add offset to PC (relative jump) Stack: [condition] → []
BREAK
Operand: None
Effect: Unwind call stack until frame with isBreakTarget = true, resume there
Stack: No change
Errors: Throws if no break target found
Behavior:
- Pop frames from call stack
- For each frame, restore its returnScope and returnAddress
- Stop when finding frame with
isBreakTarget = true - Resume execution at that frame's return address
Note on CONTINUE: There is no CONTINUE opcode. Compilers implement continue behavior using JUMP with negative offsets to jump back to the loop start.
Exception Handling
PUSH_TRY
Operand: Catch block offset (number) Effect: Push exception handler Stack: No change
Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address.
PUSH_FINALLY
Operand: Finally block offset (number) Effect: Add finally address to most recent exception handler Stack: No change Errors: Throws if no exception handler to modify
Adds a finally block to the current try/catch. The finally block will execute whether an exception is thrown or not.
POP_TRY
Operand: None Effect: Pop exception handler (try block completed without exception) Stack: No change Errors: Throws if no handler to pop
Behavior:
- Pop exception handler
- Continue to next instruction
Notes:
- The VM does NOT automatically jump to finally blocks on POP_TRY
- The compiler must explicitly generate JUMP instructions to finally blocks when the try block completes normally
- The compiler must ensure catch blocks also jump to finally when present
- Finally blocks should end with normal control flow (no special terminator needed)
THROW
Operand: None Effect: Throw exception with error value from stack Stack: [errorValue] → (unwound)
Behavior:
- Pop error value from stack
- If no exception handlers, throw JavaScript Error with error message
- Otherwise, pop most recent exception handler
- Unwind call stack to handler's depth
- Restore handler's scope
- Push error value back onto stack
- If handler has
finallyAddress, jump there; otherwise jump tocatchAddress
Notes:
- When THROW jumps to finally (if present), the error value remains on stack for the finally block
- The compiler must structure catch/finally blocks appropriately to handle the error value
- If finally is present, the catch block is typically entered via a jump from the finally block or through explicit compiler-generated control flow
Function Operations
MAKE_FUNCTION
Operand: Index into constants pool (number) Effect: Create function value, capturing current scope Stack: [] → [function]
The constant must be a function_def with:
params: Parameter namesdefaults: Map of param names to constant indices for default valuesbody: Instruction address of function bodyvariadic: If true, second-to-last param (ifnamedis also true) or last param collects remaining positional args as arraynamed: If true, last param collects unmatched named args as dict
The created function captures currentScope as its parentScope.
CALL
Operand: None
Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]
Behavior:
- Pop namedCount from stack (top of stack)
- Pop positionalCount from stack
- Pop named arguments (name/value pairs) from stack
- Pop positional arguments from stack
- Pop function from stack
- Mark current frame (if exists) as break target (
isBreakTarget = true) - Push new call frame with current PC and scope
- Create new scope with function's parentScope as parent
- Bind parameters:
- For regular functions: bind params by position, then by name, then defaults, then null
- For variadic functions: bind fixed params, collect rest into array
- For functions with
named: true: bind fixed params by position/name, collect unmatched named args into dict
- Set currentScope to new scope
- Jump to function body
Parameter Binding Priority (for fixed params):
- Named argument (if provided and matches param name)
- Positional argument (if provided)
- Default value (if defined)
- Null
Named Args Handling:
- Named args that match fixed parameter names are bound to those params
- If the function has
named: true, remaining named args (that don't match any fixed param) are collected into the last parameter as a dict - This allows flexible calling:
fn(x=10, y=20, extra=30)whereextragoes to the named args dict
Errors: Throws if top of stack is not a function
TAIL_CALL
Operand: None Effect: Same as CALL, but reuses current call frame Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]
Behavior: Identical to CALL except:
- Does NOT push a new call frame
- Replaces currentScope instead of creating nested scope
- Enables unbounded tail recursion without stack overflow
RETURN
Operand: None Effect: Return from function Stack: [returnValue] → (restored stack with returnValue on top)
Behavior:
- Pop return value (or null if stack empty)
- Pop call frame
- Restore scope from frame
- Set PC to frame's return address
- Push return value onto stack
Errors: Throws if no call frame to return from
TRY_CALL
Operand: Variable name (string) Effect: Conditionally call function or push value/string onto stack Stack: [] → [returnValue | value | name] Errors: Never throws (unlike CALL)
Behavior:
- Look up variable by name in scope chain
- If variable is a function: Call it with 0 arguments (no positional, no named) and push the returned value onto the stack.
- If variable exists but is not a function: Push the variable's value onto stack
- If variable doesn't exist: Push the variable name as a string onto stack
Use Cases:
- DSL/templating languages with "call if callable, otherwise use as literal" semantics
- Shell-like behavior where unknown identifiers become strings
- Optional function hooks (call if defined, silently skip if not)
Implementation Note:
- Uses intentional fall-through in VM switch statement from TRY_CALL to CALL case
- When function is found, stacks are set up to match CALL's expectations exactly
- No break target marking or frame pushing occurs when non-function value is found
Example:
MAKE_FUNCTION () .body
STORE greet
PUSH 42
STORE answer
TRY_CALL greet ; Calls function greet(), returns its value
TRY_CALL answer ; Pushes 42 (number value)
TRY_CALL unknown ; Pushes "unknown" (string)
.body:
PUSH "Hello!"
RETURN
Array Operations
MAKE_ARRAY
Operand: Number of items (number) Effect: Create array from N stack items Stack: [item1, item2, ..., itemN] → [array]
Items are popped in reverse order (item1 is array[0]).
ARRAY_GET
Operand: None Effect: Get array element at index Stack: [array, index] → [value] Errors: Throws if not array or index out of bounds
Index is coerced to number and floored.
ARRAY_SET
Operand: None Effect: Set array element at index (mutates array) Stack: [array, index, value] → [] Errors: Throws if not array or index out of bounds
ARRAY_PUSH
Operand: None Effect: Append value to end of array (mutates array, grows by 1) Stack: [array, value] → [] Errors: Throws if not array
ARRAY_LEN
Operand: None Effect: Get array length Stack: [array] → [length] Errors: Throws if not array
Dictionary Operations
MAKE_DICT
Operand: Number of key-value pairs (number) Effect: Create dict from N key-value pairs Stack: [key1, val1, key2, val2, ...] → [dict]
Keys are coerced to strings.
DICT_GET
Operand: None Effect: Get dict value for key Stack: [dict, key] → [value]
Returns null if key not found. Key is coerced to string. Errors: Throws if not dict
DICT_SET
Operand: None Effect: Set dict value for key (mutates dict) Stack: [dict, key, value] → []
Key is coerced to string. Errors: Throws if not dict
DICT_HAS
Operand: None Effect: Check if key exists in dict Stack: [dict, key] → [boolean]
Key is coerced to string. Errors: Throws if not dict
TypeScript Interop
CALL_NATIVE
Operand: Function name (string) Effect: Call registered TypeScript function Stack: [...args] → [returnValue]
Behavior:
- Look up function by name in registry
- Mark current frame (if exists) as break target
- Await function call (native function receives arguments and returns a Value)
- Push return value onto stack
Notes:
- TypeScript functions are passed the raw stack values as arguments
- They must return a valid Value
- They can be async (VM awaits them)
- Like CALL, but function is from TypeScript registry instead of stack
Errors: Throws if function not found
TypeScript Function Signature:
type TypeScriptFunction = (...args: Value[]) => Promise<Value> | Value;
Special
HALT
Operand: None Effect: Stop execution Stack: No change
Label Syntax
The bytecode format supports labels for improved readability:
Label Definition: .label_name: marks an instruction position
Label Reference: .label_name in operands (e.g., JUMP .loop_start)
Labels are resolved to numeric offsets during parsing. The original numeric offset syntax (#N) is still supported for backwards compatibility.
Example with labels:
JUMP .skip
.middle:
PUSH 999
HALT
.skip:
PUSH 42
HALT
Equivalent with numeric offsets:
JUMP #2
PUSH 999
HALT
PUSH 42
HALT
Common Bytecode Patterns
If-Else Statement
LOAD 'x'
PUSH 5
GT
JUMP_IF_FALSE .else
# then block
JUMP .end
.else:
# else block
.end:
While Loop
.loop_start:
# condition
JUMP_IF_FALSE .loop_end
# body
JUMP .loop_start
.loop_end:
Function Definition
MAKE_FUNCTION <params> .function_body
STORE 'functionName'
JUMP .skip_body
.function_body:
# function code
RETURN
.skip_body:
Try-Catch
PUSH_TRY .catch
; try block
POP_TRY
JUMP .end
.catch:
STORE 'errorVar' ; Error is on stack
; catch block
.end:
Try-Catch-Finally
PUSH_TRY .catch
PUSH_FINALLY .finally
; try block
POP_TRY
JUMP .finally
.catch:
STORE 'errorVar' ; Error is on stack
; catch block
JUMP .finally
.finally:
; finally block (executes in both cases)
.end:
Named Function Call
LOAD 'mkdir'
PUSH 'src/bin' # positional arg
PUSH 'recursive' # name
PUSH true # value
PUSH 1 # positionalCount
PUSH 1 # namedCount
CALL
Tail Recursive Function
MAKE_FUNCTION (n acc) .factorial_body
STORE 'factorial'
JUMP .main
.factorial_body:
LOAD 'n'
PUSH 0
EQ
JUMP_IF_FALSE .recurse
LOAD 'acc'
RETURN
.recurse:
LOAD 'factorial'
LOAD 'n'
PUSH 1
SUB
LOAD 'n'
LOAD 'acc'
MUL
PUSH 2 # positionalCount
PUSH 0 # namedCount
TAIL_CALL # No stack growth!
.main:
LOAD 'factorial'
PUSH 5
PUSH 1
PUSH 2 # positionalCount
PUSH 0 # namedCount
CALL
Error Conditions
Runtime Errors
All of these should throw errors:
- Undefined Variable: LOAD of non-existent variable
- Type Mismatch: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function
- Index Out of Bounds: ARRAY_GET/SET with invalid index
- Stack Underflow: Arithmetic ops without enough operands
- Uncaught Exception: THROW with no exception handlers
- Break Outside Loop: BREAK with no break target
- Continue Outside Loop: CONTINUE with no continue target
- Return Outside Function: RETURN with no call frame
- Unknown Function: CALL_NATIVE with unregistered function
- Mismatched Handler: POP_TRY with no handler
- Invalid Constant: PUSH with invalid constant index
- Invalid Function Definition: MAKE_FUNCTION with non-function_def constant
Edge Cases
Empty Stack
- Arithmetic/comparison ops on empty stack should throw
- RETURN with empty stack returns null
- HALT with empty stack returns null
Null Values
- Arithmetic with null coerces to 0
- Comparisons with null work normally
- Null is falsy
Scope Shadowing
- Variables in inner scopes shadow outer scopes during LOAD
- STORE updates outermost scope where variable is defined
Function Parameter Binding
- Missing positional args → use named args → use defaults → use null
- Extra positional args → collected by variadic parameter or ignored
- Extra named args → collected by named args parameter (if
named: true) or ignored - Named arg matching is case-sensitive
Tail Call Optimization
- TAIL_CALL reuses frame, so return address is from original caller
- Multiple tail calls in sequence never grow stack
- TAIL_CALL can call different function (not just self-recursive)
Break/Continue Semantics
- BREAK unwinds to frame that called the iterator function
- Multiple nested function calls: break exits all of them until reaching marked frame
- CONTINUE is implemented by the compiler using JUMPs
Exception Unwinding
- THROW unwinds call stack to handler's depth
- Exception handlers form a stack (nested try blocks)
- Error value on stack is available in catch/finally blocks
- When THROW occurs and handler has finallyAddress, VM jumps to finally first
- Compiler is responsible for structuring control flow so finally executes in all cases
- Finally typically executes after try (if no exception) or after catch (if exception), but control flow is compiler-managed
VM Initialization
const vm = new VM(bytecode);
vm.registerFunction('add', (a, b) => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
const result = await vm.execute()
Testing Considerations
Unit Tests Should Cover
- Each opcode individually with minimal setup
- Type coercion for arithmetic, comparison, and logical ops
- Scope chain resolution (local, parent, global)
- Call frames (nested calls, return values)
- Exception handling (nested try blocks, unwinding, finally blocks)
- Break/continue (nested functions, iterator pattern)
- Closures (capturing variables, multiple nesting levels)
- Tail calls (self-recursive, mutual recursion)
- Parameter binding (positional, named, defaults, variadic, named args collection, combinations)
- Array/dict operations (creation, access, mutation)
- Error conditions (all error cases listed above)
- Edge cases (empty stack, null values, shadowing, etc.)
Integration Tests Should Cover
- Recursive functions (factorial, fibonacci)
- Iterator pattern (each with break)
- Closure examples (counters, adder factories)
- Exception examples (try/catch/throw chains)
- Complex scope (deeply nested functions)
- Mixed features (variadic + defaults + named args)
Property-Based Tests Should Cover
- Stack integrity (stack size matches expectations after ops)
- Scope integrity (variables remain accessible)
- Frame integrity (call stack unwinds correctly)
Version History
- 1.0 (2024): Initial specification
Notes
- PC increment happens after each instruction execution
- Jump instructions use relative offsets (added to current PC after increment)
- All async operations (native functions) must be awaited
- Arrays and dicts are mutable (pass by reference)
- Functions are immutable values
- The VM is single-threaded (no concurrency primitives)