19 KiB
ReefVM Specification
Version 1.0
Overview
The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.
Architecture
Components
- Value Stack: Operand stack for computation
- Call Stack: Call frames for function invocations
- Exception Handlers: Stack of try/catch handlers
- Scope Chain: Linked scopes for lexical variable resolution
- Program Counter (PC): Current instruction index
- Constants Pool: Immutable values and function metadata
- Native Function Registry: External functions callable from Shrimp
Execution Model
- VM loads bytecode with instructions and constants
- PC starts at instruction 0
- Each instruction is executed sequentially (unless jumps occur)
- Execution continues until HALT or end of instructions
- Final value is top of stack (or null if empty)
Value Types
All runtime values are tagged unions:
type Value =
| { type: 'null', value: null }
| { type: 'boolean', value: boolean }
| { type: 'number', value: number }
| { type: 'string', value: string }
| { type: 'array', value: Value[] }
| { type: 'dict', value: Map<string, Value> }
| { type: 'function', params: string[], defaults: Record<string, Value>,
body: number, parentScope: Scope, variadic: boolean, kwargs: boolean }
Type Coercion
toNumber: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0
toString: string → identity, number → string, boolean → string, null → "null", function → "", array → "[item, item]", dict → "{key: value, ...}"
isTrue: Only null and false are falsy. Everything else (including 0, "", empty arrays, empty dicts) is truthy.
Bytecode Format
type Bytecode = {
instructions: Instruction[]
constants: Constant[]
}
type Instruction = {
op: OpCode
operand?: number | string
}
type Constant =
| Value
| { type: 'function_def', params: string[], defaults: Record<string, number>,
body: number, variadic: boolean, kwargs: boolean }
Scope Chain
Variables are resolved through a linked scope chain:
class Scope {
locals: Map<string, Value>;
parent?: Scope;
}
Variable Resolution (LOAD):
- Check current scope's locals
- If not found, recursively check parent
- If not found anywhere, throw error
Variable Assignment (STORE):
- If variable exists in current scope, update it
- Else if variable exists in any parent scope, update it there
- Else create new variable in current scope
This implements "assign to outermost scope where defined" semantics.
Call Frames
type CallFrame = {
returnAddress: number // Where to resume after RETURN
returnScope: Scope // Scope to restore after RETURN
isBreakTarget: boolean // Can be targeted by BREAK
}
Exception Handlers
type ExceptionHandler = {
catchAddress: number // Where to jump on exception
finallyAddress?: number // Where to jump for finally block (always runs)
callStackDepth: number // Call stack depth when handler pushed
scope: Scope // Scope to restore in catch block
}
Opcodes
Stack Operations
PUSH
Operand: Index into constants pool (number)
Effect: Push constant onto stack
Stack: [] → [value]
POP
Operand: None
Effect: Discard top of stack
Stack: [value] → []
DUP
Operand: None
Effect: Duplicate top of stack
Stack: [value] → [value, value]
Variable Operations
LOAD
Operand: Variable name (string)
Effect: Push variable value onto stack
Stack: [] → [value]
Errors: Throws if variable not found in scope chain
STORE
Operand: Variable name (string)
Effect: Store top of stack into variable (following scope chain rules)
Stack: [value] → []
Arithmetic Operations
All arithmetic operations pop two values, perform operation, push result as number.
ADD
Stack: [a, b] → [a + b]
Note: Only for numbers (use separate string concat if needed)
SUB
Stack: [a, b] → [a - b]
MUL
Stack: [a, b] → [a * b]
DIV
Stack: [a, b] → [a / b]
MOD
Stack: [a, b] → [a % b]
Comparison Operations
All comparison operations pop two values, compare, push boolean result.
EQ
Stack: [a, b] → [boolean] Note: Type-aware equality (deep comparison for arrays/dicts)
NEQ
Stack: [a, b] → [boolean]
LT
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
GT
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
LTE
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
GTE
Stack: [a, b] → [boolean] Note: Numeric comparison (values coerced to numbers)
Logical Operations
NOT
Stack: [a] → [!isTrue(a)]
Note on AND/OR: There are no AND/OR opcodes. Short-circuiting logical operations are implemented at the compiler level using JUMP instructions:
AND pattern (short-circuits if left side is false):
<evaluate left>
DUP
JUMP_IF_FALSE 2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
OR pattern (short-circuits if left side is true):
<evaluate left>
DUP
JUMP_IF_TRUE 2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
Control Flow
JUMP
Operand: Offset (number) Effect: Add offset to PC (relative jump) Stack: No change
JUMP_IF_FALSE
Operand: Offset (number) Effect: If top of stack is falsy, add offset to PC (relative jump) Stack: [condition] → []
JUMP_IF_TRUE
Operand: Offset (number) Effect: If top of stack is truthy, add offset to PC (relative jump) Stack: [condition] → []
BREAK
Operand: None
Effect: Unwind call stack until frame with isBreakTarget = true, resume there
Stack: No change
Errors: Throws if no break target found
Behavior:
- Pop frames from call stack
- For each frame, restore its returnScope and returnAddress
- Stop when finding frame with
isBreakTarget = true - Resume execution at that frame's return address
Note on CONTINUE: There is no CONTINUE opcode. Compilers implement continue behavior using JUMP with negative offsets to jump back to the loop start.
Exception Handling
PUSH_TRY
Operand: Catch block offset (number)
Effect: Push exception handler
Stack: No change
Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address.
PUSH_FINALLY
Operand: Finally block offset (number)
Effect: Add finally address to most recent exception handler
Stack: No change
Errors: Throws if no exception handler to modify
Adds a finally block to the current try/catch. The finally block will execute whether an exception is thrown or not.
POP_TRY
Operand: None
Effect: Pop exception handler (try block completed without exception)
Stack: No change
Errors: Throws if no handler to pop
Behavior:
- Pop exception handler
- If handler has
finallyAddress, jump there - Otherwise continue to next instruction
Notes:
- The VM ensures finally runs when try completes normally
- The compiler must ensure catch blocks jump to finally when present
- Finally blocks should end with normal control flow (no special terminator needed)
THROW
Operand: None
Effect: Throw exception with error value from stack
Stack: [errorValue] → (unwound)
Behavior:
- Pop error value from stack
- If no exception handlers, throw JavaScript Error with error message
- Otherwise, pop most recent exception handler
- Unwind call stack to handler's depth
- Restore handler's scope
- Push error value back onto stack
- Jump to handler's catch address
- Note: After catch block executes, compiler must jump to finally if present
Function Operations
MAKE_FUNCTION
Operand: Index into constants pool (number)
Effect: Create function value, capturing current scope
Stack: [] → [function]
The constant must be a function_def with:
params: Parameter namesdefaults: Map of param names to constant indices for default valuesbody: Instruction address of function bodyvariadic: If true, last param collects remaining positional args as arraykwargs: If true, last param collects all named args as dict
The created function captures currentScope as its parentScope.
CALL
Operand: None
Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]
Behavior:
- Pop namedCount from stack (top of stack)
- Pop positionalCount from stack
- Pop named arguments (name/value pairs) from stack
- Pop positional arguments from stack
- Pop function from stack
- Mark current frame (if exists) as break target (
isBreakTarget = true) - Push new call frame with current PC and scope
- Create new scope with function's parentScope as parent
- Bind parameters:
- For regular functions: bind params by position, then by name, then defaults, then null
- For variadic functions: bind fixed params, collect rest into array
- For kwargs functions: bind fixed params by position/name, collect unmatched named args into dict
- Set currentScope to new scope
- Jump to function body
Parameter Binding Priority (for fixed params):
- Named argument (if provided and matches param name)
- Positional argument (if provided)
- Default value (if defined)
- Null
Named Args Handling:
- Named args that match fixed parameter names are bound to those params
- Remaining named args (that don't match any fixed param) are collected into
@kwargsdict - This allows flexible calling:
fn(x=10, y=20, extra=30)whereextragoes to kwargs
Errors: Throws if top of stack is not a function
TAIL_CALL
Operand: None Effect: Same as CALL, but reuses current call frame Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ..., positionalCount, namedCount] → [returnValue]
Behavior: Identical to CALL except:
- Does NOT push a new call frame
- Replaces currentScope instead of creating nested scope
- Enables unbounded tail recursion without stack overflow
RETURN
Operand: None
Effect: Return from function
Stack: [returnValue] → (restored stack with returnValue on top)
Behavior:
- Pop return value (or null if stack empty)
- Pop call frame
- Restore scope from frame
- Set PC to frame's return address
- Push return value onto stack
Errors: Throws if no call frame to return from
Array Operations
MAKE_ARRAY
Operand: Number of items (number)
Effect: Create array from N stack items
Stack: [item1, item2, ..., itemN] → [array]
Items are popped in reverse order (item1 is array[0]).
ARRAY_GET
Operand: None
Effect: Get array element at index
Stack: [array, index] → [value]
Errors: Throws if not array or index out of bounds
Index is coerced to number and floored.
ARRAY_SET
Operand: None Effect: Set array element at index (mutates array) Stack: [array, index, value] → [] Errors: Throws if not array or index out of bounds
ARRAY_PUSH
Operand: None Effect: Append value to end of array (mutates array, grows by 1) Stack: [array, value] → [] Errors: Throws if not array
ARRAY_LEN
Operand: None Effect: Get array length Stack: [array] → [length] Errors: Throws if not array
Dictionary Operations
MAKE_DICT
Operand: Number of key-value pairs (number)
Effect: Create dict from N key-value pairs
Stack: [key1, val1, key2, val2, ...] → [dict]
Keys are coerced to strings.
DICT_GET
Operand: None
Effect: Get dict value for key
Stack: [dict, key] → [value]
Returns null if key not found. Key is coerced to string.
Errors: Throws if not dict
DICT_SET
Operand: None
Effect: Set dict value for key (mutates dict)
Stack: [dict, key, value] → []
Key is coerced to string.
Errors: Throws if not dict
DICT_HAS
Operand: None
Effect: Check if key exists in dict
Stack: [dict, key] → [boolean]
Key is coerced to string.
Errors: Throws if not dict
TypeScript Interop
CALL_NATIVE
Operand: Function name (string)
Effect: Call registered TypeScript function
Stack: [...args] → [returnValue]
Behavior:
- Look up function by name in registry
- Mark current frame (if exists) as break target
- Await function call (native function receives arguments and returns a Value)
- Push return value onto stack
Notes:
- TypeScript functions are passed the raw stack values as arguments
- They must return a valid Value
- They can be async (VM awaits them)
- Like CALL, but function is from TypeScript registry instead of stack
Errors: Throws if function not found
TypeScript Function Signature:
type TypeScriptFunction = (...args: Value[]) => Promise<Value> | Value;
Special
HALT
Operand: None
Effect: Stop execution
Stack: No change
Common Bytecode Patterns
If-Else Statement
LOAD 'x'
PUSH 5
GT
JUMP_IF_FALSE 2 # skip then block, jump to else
# then block (N instructions)
JUMP M # skip else block
# else block
While Loop
loop_start:
# condition
JUMP_IF_FALSE N # jump past loop body
# body (N-1 instructions)
JUMP -N # jump back to loop_start
loop_end:
Function Definition
MAKE_FUNCTION <index>
STORE 'functionName'
JUMP N # skip function body
function_body:
# function code (N instructions)
RETURN
skip_body:
Try-Catch
PUSH_TRY catch_label
# try block
POP_TRY
JUMP end_label
catch_label:
STORE 'errorVar' # Error is on stack
# catch block
end_label:
Named Function Call
LOAD 'mkdir'
PUSH 'src/bin' # positional arg
PUSH 'recursive' # name
PUSH true # value
PUSH 1 # positionalCount
PUSH 1 # namedCount
CALL
Tail Recursive Function
MAKE_FUNCTION <factorial_def>
STORE 'factorial'
JUMP 10 # skip to main
factorial_body:
LOAD 'n'
PUSH 0
EQ
JUMP_IF_FALSE 2 # skip to recurse
LOAD 'acc'
RETURN
recurse:
LOAD 'factorial'
LOAD 'n'
PUSH 1
SUB
LOAD 'n'
LOAD 'acc'
MUL
PUSH 2 # positionalCount
PUSH 0 # namedCount
TAIL_CALL # No stack growth!
main:
LOAD 'factorial'
PUSH 5
PUSH 1
PUSH 2 # positionalCount
PUSH 0 # namedCount
CALL
Error Conditions
Runtime Errors
All of these should throw errors:
- Undefined Variable: LOAD of non-existent variable
- Type Mismatch: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function
- Index Out of Bounds: ARRAY_GET/SET with invalid index
- Stack Underflow: Arithmetic ops without enough operands
- Uncaught Exception: THROW with no exception handlers
- Break Outside Loop: BREAK with no break target
- Continue Outside Loop: CONTINUE with no continue target
- Return Outside Function: RETURN with no call frame
- Unknown Function: CALL_NATIVE with unregistered function
- Mismatched Handler: POP_TRY with no handler
- Invalid Constant: PUSH with invalid constant index
- Invalid Function Definition: MAKE_FUNCTION with non-function_def constant
Edge Cases
Empty Stack
- Arithmetic/comparison ops on empty stack should throw
- RETURN with empty stack returns null
- HALT with empty stack returns null
Null Values
- Arithmetic with null coerces to 0
- Comparisons with null work normally
- Null is falsy
Scope Shadowing
- Variables in inner scopes shadow outer scopes during LOAD
- STORE updates outermost scope where variable is defined
Function Parameter Binding
- Missing positional args → use named args → use defaults → use null
- Extra positional args → collected by variadic parameter or ignored
- Extra named args → collected by kwargs parameter or ignored
- Named arg matching is case-sensitive
Tail Call Optimization
- TAIL_CALL reuses frame, so return address is from original caller
- Multiple tail calls in sequence never grow stack
- TAIL_CALL can call different function (not just self-recursive)
Break/Continue Semantics
- BREAK unwinds to frame that called the iterator function
- Multiple nested function calls: break exits all of them until reaching marked frame
- CONTINUE is implemented by the compiler using JUMPs
Exception Unwinding
- THROW unwinds call stack to handler's depth, not just to handler
- Exception handlers form a stack (nested try blocks)
- Error value on stack is available in catch block via STORE
- Finally blocks always execute, even if there's a return/break in try or catch
- Finally executes after try (if no exception) or after catch (if exception)
VM Initialization
const vm = new VM(bytecode);
vm.registerFunction('add', (a, b) => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
const result = await vm.execute()
Testing Considerations
Unit Tests Should Cover
- Each opcode individually with minimal setup
- Type coercion for arithmetic, comparison, and logical ops
- Scope chain resolution (local, parent, global)
- Call frames (nested calls, return values)
- Exception handling (nested try blocks, unwinding, finally blocks)
- Break/continue (nested functions, iterator pattern)
- Closures (capturing variables, multiple nesting levels)
- Tail calls (self-recursive, mutual recursion)
- Parameter binding (positional, named, defaults, variadic, kwargs, combinations)
- Array/dict operations (creation, access, mutation)
- Error conditions (all error cases listed above)
- Edge cases (empty stack, null values, shadowing, etc.)
Integration Tests Should Cover
- Recursive functions (factorial, fibonacci)
- Iterator pattern (each with break)
- Closure examples (counters, adder factories)
- Exception examples (try/catch/throw chains)
- Complex scope (deeply nested functions)
- Mixed features (variadic + defaults + kwargs)
Property-Based Tests Should Cover
- Stack integrity (stack size matches expectations after ops)
- Scope integrity (variables remain accessible)
- Frame integrity (call stack unwinds correctly)
Version History
- 1.0 (2024): Initial specification
Notes
- PC increment happens after each instruction execution
- Jump instructions use relative offsets (added to current PC after increment)
- All async operations (native functions) must be awaited
- Arrays and dicts are mutable (pass by reference)
- Functions are immutable values
- The VM is single-threaded (no concurrency primitives)