17 KiB
ReefVM Specification
Version 1.0
Overview
The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.
Architecture
Components
- Value Stack: Operand stack for computation
- Call Stack: Call frames for function invocations
- Exception Handlers: Stack of try/catch handlers
- Scope Chain: Linked scopes for lexical variable resolution
- Program Counter (PC): Current instruction index
- Constants Pool: Immutable values and function metadata
- TypeScript Function Registry: External functions callable from Shrimp
Execution Model
- VM loads bytecode with instructions and constants
- PC starts at instruction 0
- Each instruction is executed sequentially (unless jumps occur)
- Execution continues until HALT or end of instructions
- Final value is top of stack (or null if empty)
Value Types
All runtime values are tagged unions:
type Value =
| { type: 'null', value: null }
| { type: 'boolean', value: boolean }
| { type: 'number', value: number }
| { type: 'string', value: string }
| { type: 'array', items: Value[] }
| { type: 'dict', entries: Map<string, Value> }
| { type: 'function', params: string[], defaults: Record<string, Value>,
body: number, scope: Scope, variadic: boolean, kwargs: boolean }
Type Coercion
toNumber: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0
toString: string → identity, number → string, boolean → string, null → "null", function → "", array → "[item, item]", dict → "{key: value, ...}"
isTruthy: boolean → value, number → value !== 0, string → value !== "", null → false, array → length > 0, dict → size > 0, others → true
Bytecode Format
type Bytecode = {
instructions: Instruction[]
constants: Constant[]
}
type Instruction = {
op: OpCode
operand?: number | string | { positional: number; named: number }
}
type Constant =
| Value
| { type: 'function_def', params: string[], defaults: Record<string, number>,
body: number, variadic: boolean, kwargs: boolean }
Scope Chain
Variables are resolved through a linked scope chain:
class Scope {
locals: Map<string, Value>;
parent?: Scope;
}
Variable Resolution (LOAD):
- Check current scope's locals
- If not found, recursively check parent
- If not found anywhere, throw error
Variable Assignment (STORE):
- If variable exists in current scope, update it
- Else if variable exists in any parent scope, update it there
- Else create new variable in current scope
This implements "assign to outermost scope where defined" semantics.
Call Frames
type CallFrame = {
returnAddress: number // Where to resume after RETURN
returnScope: Scope // Scope to restore after RETURN
isBreakTarget: boolean // Can be targeted by BREAK
continueAddress?: number // Where to jump for CONTINUE
}
Exception Handlers
type ExceptionHandler = {
catchAddress: number // Where to jump on exception
callStackDepth: number // Call stack depth when handler pushed
scope: Scope // Scope to restore in catch block
}
Opcodes
Stack Operations
PUSH
Operand: Index into constants pool (number)
Effect: Push constant onto stack
Stack: [] → [value]
POP
Operand: None
Effect: Discard top of stack
Stack: [value] → []
DUP
Operand: None
Effect: Duplicate top of stack
Stack: [value] → [value, value]
Variable Operations
LOAD
Operand: Variable name (string)
Effect: Push variable value onto stack
Stack: [] → [value]
Errors: Throws if variable not found in scope chain
STORE
Operand: Variable name (string)
Effect: Store top of stack into variable (following scope chain rules)
Stack: [value] → []
Arithmetic Operations
All arithmetic operations pop two values, perform operation, push result as number.
ADD
Stack: [a, b] → [a + b]
Note: Only for numbers (use separate string concat if needed)
SUB
Stack: [a, b] → [a - b]
MUL
Stack: [a, b] → [a * b]
DIV
Stack: [a, b] → [a / b]
MOD
Stack: [a, b] → [a % b]
Comparison Operations
All comparison operations pop two values, compare, push boolean (as number 1/0).
EQ
Stack: [a, b] → [a == b ? 1 : 0]
Note: Type-aware equality
NEQ
Stack: [a, b] → [a != b ? 1 : 0]
LT
Stack: [a, b] → [a < b ? 1 : 0]
GT
Stack: [a, b] → [a > b ? 1 : 0]
LTE
Stack: [a, b] → [a <= b ? 1 : 0]
GTE
Stack: [a, b] → [a >= b ? 1 : 0]
Logical Operations
AND
Stack: [a, b] → [isTruthy(a) && isTruthy(b) ? 1 : 0]
OR
Stack: [a, b] → [isTruthy(a) || isTruthy(b) ? 1 : 0]
NOT
Stack: [a] → [!isTruthy(a)]
Control Flow
JUMP
Operand: Instruction address (number)
Effect: Set PC to address
Stack: No change
JUMP_IF_FALSE
Operand: Instruction address (number)
Effect: If top of stack is falsy, jump to address
Stack: [condition] → []
JUMP_IF_TRUE
Operand: Instruction address (number)
Effect: If top of stack is truthy, jump to address
Stack: [condition] → []
BREAK
Operand: None
Effect: Unwind call stack until frame with isBreakTarget = true, resume there
Stack: No change
Errors: Throws if no break target found
Behavior:
- Pop frames from call stack
- For each frame, restore its returnScope and returnAddress
- Stop when finding frame with
isBreakTarget = true - Resume execution at that frame's return address
CONTINUE
Operand: None
Effect: Unwind to nearest frame with continueAddress, jump there
Stack: No change
Errors: Throws if no continue target found
Behavior:
- Search call stack (without popping) for frame with
continueAddress - When found, restore scope and jump to
continueAddress - Pop all frames above the continue target
Exception Handling
PUSH_TRY
Operand: Catch block address (number)
Effect: Push exception handler
Stack: No change
Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address.
POP_TRY
Operand: None
Effect: Pop exception handler (try block completed without exception)
Stack: No change
Errors: Throws if no handler to pop
THROW
Operand: None
Effect: Throw exception with error value from stack
Stack: [errorValue] → (unwound)
Behavior:
- Pop error value from stack
- If no exception handlers, throw JavaScript Error with error message
- Otherwise, pop most recent exception handler
- Unwind call stack to handler's depth
- Restore handler's scope
- Push error value back onto stack
- Jump to handler's catch address
Function Operations
MAKE_FUNCTION
Operand: Index into constants pool (number)
Effect: Create function value, capturing current scope
Stack: [] → [function]
The constant must be a function_def with:
params: Parameter namesdefaults: Map of param names to constant indices for default valuesbody: Instruction address of function bodyvariadic: If true, last param collects remaining positional args as arraykwargs: If true, last param collects all named args as dict
The created function captures currentScope as its parentScope.
CALL
Operand: Either:
- Number: positional argument count
- Object:
{ positional: number, named: number }
Stack: [fn, arg1, arg2, ..., name1, val1, name2, val2, ...] → [returnValue]
Behavior:
- Pop function from stack
- Pop named arguments (name/value pairs) according to operand
- Pop positional arguments according to operand
- Mark current frame (if exists) as break target (
isBreakTarget = true) - Push new call frame with current PC and scope
- Create new scope with function's parentScope as parent
- Bind parameters:
- For regular functions: bind params by position, then by name, then defaults, then null
- For variadic functions: bind fixed params, collect rest into array
- For kwargs functions: bind fixed params, collect named args into dict
- Set currentScope to new scope
- Jump to function body
Parameter Binding Priority:
- Named argument (if provided)
- Positional argument (if provided)
- Default value (if defined)
- Null
Errors: Throws if top of stack is not a function
TAIL_CALL
Operand: Same as CALL
Effect: Same as CALL, but reuses current call frame
Stack: Same as CALL
Behavior: Identical to CALL except:
- Does NOT push a new call frame
- Replaces currentScope instead of creating nested scope
- Enables unbounded tail recursion without stack overflow
RETURN
Operand: None
Effect: Return from function
Stack: [returnValue] → (restored stack with returnValue on top)
Behavior:
- Pop return value (or null if stack empty)
- Pop call frame
- Restore scope from frame
- Set PC to frame's return address
- Push return value onto stack
Errors: Throws if no call frame to return from
Array Operations
MAKE_ARRAY
Operand: Number of items (number)
Effect: Create array from N stack items
Stack: [item1, item2, ..., itemN] → [array]
Items are popped in reverse order (item1 is array[0]).
ARRAY_GET
Operand: None
Effect: Get array element at index
Stack: [array, index] → [value]
Errors: Throws if not array or index out of bounds
Index is coerced to number and floored.
ARRAY_SET
Operand: None
Effect: Set array element at index (mutates array)
Stack: [array, index, value] → []
Errors: Throws if not array or index out of bounds
ARRAY_LEN
Operand: None
Effect: Get array length
Stack: [array] → [length]
Errors: Throws if not array
Dictionary Operations
MAKE_DICT
Operand: Number of key-value pairs (number)
Effect: Create dict from N key-value pairs
Stack: [key1, val1, key2, val2, ...] → [dict]
Keys are coerced to strings.
DICT_GET
Operand: None
Effect: Get dict value for key
Stack: [dict, key] → [value]
Returns null if key not found. Key is coerced to string.
Errors: Throws if not dict
DICT_SET
Operand: None
Effect: Set dict value for key (mutates dict)
Stack: [dict, key, value] → []
Key is coerced to string.
Errors: Throws if not dict
DICT_HAS
Operand: None
Effect: Check if key exists in dict
Stack: [dict, key] → [boolean]
Key is coerced to string.
Errors: Throws if not dict
TypeScript Interop
CALL_TYPESCRIPT
Operand: Function name (string)
Effect: Call registered TypeScript function
Stack: [...args] → [returnValue]
Behavior:
- Look up function by name in registry
- Mark current frame (if exists) as break target
- Await function call (TypeScript function receives arguments and returns a Value)
- Push return value onto stack
Notes:
- TypeScript functions are passed the raw stack values as arguments
- They must return a valid Value
- They can be async (VM awaits them)
- Like CALL, but function is from TypeScript registry instead of stack
Errors: Throws if function not found
TypeScript Function Signature:
type TypeScriptFunction = (...args: Value[]) => Promise<Value> | Value;
Special
HALT
Operand: None
Effect: Stop execution
Stack: No change
Common Bytecode Patterns
If-Else Statement
LOAD 'x'
PUSH 5
GT
JUMP_IF_FALSE else_label
# then block
JUMP end_label
else_label:
# else block
end_label:
While Loop
loop_start:
# condition
JUMP_IF_FALSE loop_end
# body
JUMP loop_start
loop_end:
Function Definition
MAKE_FUNCTION <index>
STORE 'functionName'
JUMP skip_body
function_body:
# function code
RETURN
skip_body:
Try-Catch
PUSH_TRY catch_label
# try block
POP_TRY
JUMP end_label
catch_label:
STORE 'errorVar' # Error is on stack
# catch block
end_label:
Named Function Call
LOAD 'mkdir'
PUSH 'src/bin' # positional arg
PUSH 'recursive' # name
PUSH true # value
CALL { positional: 1, named: 1 }
Tail Recursive Function
MAKE_FUNCTION <factorial_def>
STORE 'factorial'
JUMP main
factorial_body:
LOAD 'n'
PUSH 0
EQ
JUMP_IF_FALSE recurse
LOAD 'acc'
RETURN
recurse:
LOAD 'factorial'
LOAD 'n'
PUSH 1
SUB
LOAD 'n'
LOAD 'acc'
MUL
TAIL_CALL 2 # No stack growth!
main:
LOAD 'factorial'
PUSH 5
PUSH 1
CALL 2
Error Conditions
Runtime Errors
All of these should throw errors:
- Undefined Variable: LOAD of non-existent variable
- Type Mismatch: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function
- Index Out of Bounds: ARRAY_GET/SET with invalid index
- Stack Underflow: Arithmetic ops without enough operands
- Uncaught Exception: THROW with no exception handlers
- Break Outside Loop: BREAK with no break target
- Continue Outside Loop: CONTINUE with no continue target
- Return Outside Function: RETURN with no call frame
- Unknown Function: CALL_TYPESCRIPT with unregistered function
- Mismatched Handler: POP_TRY with no handler
- Invalid Constant: PUSH with invalid constant index
- Invalid Function Definition: MAKE_FUNCTION with non-function_def constant
Edge Cases
Empty Stack
- Arithmetic/comparison ops on empty stack should throw
- RETURN with empty stack returns null
- HALT with empty stack returns null
Null Values
- Arithmetic with null coerces to 0
- Comparisons with null work normally
- Null is falsy
Scope Shadowing
- Variables in inner scopes shadow outer scopes during LOAD
- STORE updates outermost scope where variable is defined
Function Parameter Binding
- Missing positional args → use named args → use defaults → use null
- Extra positional args → collected by variadic parameter or ignored
- Extra named args → collected by kwargs parameter or ignored
- Named arg matching is case-sensitive
Tail Call Optimization
- TAIL_CALL reuses frame, so return address is from original caller
- Multiple tail calls in sequence never grow stack
- TAIL_CALL can call different function (not just self-recursive)
Break/Continue Semantics
- BREAK unwinds to frame that called the iterator function
- Multiple nested function calls: break exits all of them until reaching marked frame
- CONTINUE requires explicit continueAddress in frame (set by compiler for loops)
Exception Unwinding
- THROW unwinds call stack to handler's depth, not just to handler
- Exception handlers form a stack (nested try blocks)
- Error value on stack is available in catch block via STORE
VM Initialization
const vm = new VM(bytecode);
vm.registerFunction('add', (a, b) => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
const result = await vm.execute()
Testing Considerations
Unit Tests Should Cover
- Each opcode individually with minimal setup
- Type coercion for arithmetic, comparison, and logical ops
- Scope chain resolution (local, parent, global)
- Call frames (nested calls, return values)
- Exception handling (nested try blocks, unwinding)
- Break/continue (nested functions, iterator pattern)
- Closures (capturing variables, multiple nesting levels)
- Tail calls (self-recursive, mutual recursion)
- Parameter binding (positional, named, defaults, variadic, kwargs, combinations)
- Array/dict operations (creation, access, mutation)
- Error conditions (all error cases listed above)
- Edge cases (empty stack, null values, shadowing, etc.)
Integration Tests Should Cover
- Recursive functions (factorial, fibonacci)
- Iterator pattern (each with break)
- Closure examples (counters, adder factories)
- Exception examples (try/catch/throw chains)
- Complex scope (deeply nested functions)
- Mixed features (variadic + defaults + kwargs)
Property-Based Tests Should Cover
- Stack integrity (stack size matches expectations after ops)
- Scope integrity (variables remain accessible)
- Frame integrity (call stack unwinds correctly)
Version History
- 1.0 (2024): Initial specification
Notes
- PC increment happens after each instruction execution
- Jump instructions compensate for automatic PC increment (subtract 1)
- All async operations (TypeScript functions) must be awaited
- Arrays and dicts are mutable (pass by reference)
- Functions are immutable values
- The VM is single-threaded (no concurrency primitives)