ReefVM/SPEC.md
2025-10-05 15:21:51 -07:00

648 lines
18 KiB
Markdown

# ReefVM Specification
Version 1.0
## Overview
The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.
## Architecture
### Components
- **Value Stack**: Operand stack for computation
- **Call Stack**: Call frames for function invocations
- **Exception Handlers**: Stack of try/catch handlers
- **Scope Chain**: Linked scopes for lexical variable resolution
- **Program Counter (PC)**: Current instruction index
- **Constants Pool**: Immutable values and function metadata
- **TypeScript Function Registry**: External functions callable from Shrimp
### Execution Model
1. VM loads bytecode with instructions and constants
2. PC starts at instruction 0
3. Each instruction is executed sequentially (unless jumps occur)
4. Execution continues until HALT or end of instructions
5. Final value is top of stack (or null if empty)
## Value Types
All runtime values are tagged unions:
```typescript
type Value =
| { type: 'null', value: null }
| { type: 'boolean', value: boolean }
| { type: 'number', value: number }
| { type: 'string', value: string }
| { type: 'array', value: Value[] }
| { type: 'dict', value: Map<string, Value> }
| { type: 'function', params: string[], defaults: Record<string, Value>,
body: number, parentScope: Scope, variadic: boolean, kwargs: boolean }
```
### Type Coercion
**toNumber**: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0
**toString**: string → identity, number → string, boolean → string, null → "null",
function → "<function>", array → "[item, item]", dict → "{key: value, ...}"
**isTrue**: Only `null` and `false` are falsy. Everything else (including `0`, `""`, empty arrays, empty dicts) is truthy.
## Bytecode Format
```typescript
type Bytecode = {
instructions: Instruction[]
constants: Constant[]
}
type Instruction = {
op: OpCode
operand?: number | string | { positional: number; named: number }
}
type Constant =
| Value
| { type: 'function_def', params: string[], defaults: Record<string, number>,
body: number, variadic: boolean, kwargs: boolean }
```
## Scope Chain
Variables are resolved through a linked scope chain:
```typescript
class Scope {
locals: Map<string, Value>;
parent?: Scope;
}
```
**Variable Resolution (LOAD)**:
1. Check current scope's locals
2. If not found, recursively check parent
3. If not found anywhere, throw error
**Variable Assignment (STORE)**:
1. If variable exists in current scope, update it
2. Else if variable exists in any parent scope, update it there
3. Else create new variable in current scope
This implements "assign to outermost scope where defined" semantics.
## Call Frames
```typescript
type CallFrame = {
returnAddress: number // Where to resume after RETURN
returnScope: Scope // Scope to restore after RETURN
isBreakTarget: boolean // Can be targeted by BREAK
continueAddress?: number // Where to jump for CONTINUE
}
```
## Exception Handlers
```typescript
type ExceptionHandler = {
catchAddress: number // Where to jump on exception
callStackDepth: number // Call stack depth when handler pushed
scope: Scope // Scope to restore in catch block
}
```
## Opcodes
### Stack Operations
#### PUSH
**Operand**: Index into constants pool (number)
**Effect**: Push constant onto stack
**Stack**: [] → [value]
#### POP
**Operand**: None
**Effect**: Discard top of stack
**Stack**: [value] → []
#### DUP
**Operand**: None
**Effect**: Duplicate top of stack
**Stack**: [value] → [value, value]
### Variable Operations
#### LOAD
**Operand**: Variable name (string)
**Effect**: Push variable value onto stack
**Stack**: [] → [value]
**Errors**: Throws if variable not found in scope chain
#### STORE
**Operand**: Variable name (string)
**Effect**: Store top of stack into variable (following scope chain rules)
**Stack**: [value] → []
### Arithmetic Operations
All arithmetic operations pop two values, perform operation, push result as number.
#### ADD
**Stack**: [a, b] → [a + b]
**Note**: Only for numbers (use separate string concat if needed)
#### SUB
**Stack**: [a, b] → [a - b]
#### MUL
**Stack**: [a, b] → [a * b]
#### DIV
**Stack**: [a, b] → [a / b]
#### MOD
**Stack**: [a, b] → [a % b]
### Comparison Operations
All comparison operations pop two values, compare, push boolean result.
#### EQ
**Stack**: [a, b] → [boolean]
**Note**: Type-aware equality (deep comparison for arrays/dicts)
#### NEQ
**Stack**: [a, b] → [boolean]
#### LT
**Stack**: [a, b] → [boolean]
**Note**: Numeric comparison (values coerced to numbers)
#### GT
**Stack**: [a, b] → [boolean]
**Note**: Numeric comparison (values coerced to numbers)
#### LTE
**Stack**: [a, b] → [boolean]
**Note**: Numeric comparison (values coerced to numbers)
#### GTE
**Stack**: [a, b] → [boolean]
**Note**: Numeric comparison (values coerced to numbers)
### Logical Operations
#### NOT
**Stack**: [a] → [!isTrue(a)]
**Note on AND/OR**: There are no AND/OR opcodes. Short-circuiting logical operations are implemented at the compiler level using JUMP instructions:
**AND pattern** (short-circuits if left side is false):
```
<evaluate left>
DUP
JUMP_IF_FALSE 2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
```
**OR pattern** (short-circuits if left side is true):
```
<evaluate left>
DUP
JUMP_IF_TRUE 2 # skip POP and <evaluate right>
POP
<evaluate right>
end:
```
### Control Flow
#### JUMP
**Operand**: Offset (number)
**Effect**: Add offset to PC (relative jump)
**Stack**: No change
#### JUMP_IF_FALSE
**Operand**: Offset (number)
**Effect**: If top of stack is falsy, add offset to PC (relative jump)
**Stack**: [condition] → []
#### JUMP_IF_TRUE
**Operand**: Offset (number)
**Effect**: If top of stack is truthy, add offset to PC (relative jump)
**Stack**: [condition] → []
#### BREAK
**Operand**: None
**Effect**: Unwind call stack until frame with `isBreakTarget = true`, resume there
**Stack**: No change
**Errors**: Throws if no break target found
**Behavior**:
1. Pop frames from call stack
2. For each frame, restore its returnScope and returnAddress
3. Stop when finding frame with `isBreakTarget = true`
4. Resume execution at that frame's return address
#### CONTINUE
**Operand**: None
**Effect**: Unwind to nearest frame with `continueAddress`, jump there
**Stack**: No change
**Errors**: Throws if no continue target found
**Behavior**:
1. Search call stack (without popping) for frame with `continueAddress`
2. When found, restore scope and jump to `continueAddress`
3. Pop all frames above the continue target
### Exception Handling
#### PUSH_TRY
**Operand**: Catch block address (number)
**Effect**: Push exception handler
**Stack**: No change
Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address.
#### POP_TRY
**Operand**: None
**Effect**: Pop exception handler (try block completed without exception)
**Stack**: No change
**Errors**: Throws if no handler to pop
#### THROW
**Operand**: None
**Effect**: Throw exception with error value from stack
**Stack**: [errorValue] → (unwound)
**Behavior**:
1. Pop error value from stack
2. If no exception handlers, throw JavaScript Error with error message
3. Otherwise, pop most recent exception handler
4. Unwind call stack to handler's depth
5. Restore handler's scope
6. Push error value back onto stack
7. Jump to handler's catch address
### Function Operations
#### MAKE_FUNCTION
**Operand**: Index into constants pool (number)
**Effect**: Create function value, capturing current scope
**Stack**: [] → [function]
The constant must be a `function_def` with:
- `params`: Parameter names
- `defaults`: Map of param names to constant indices for default values
- `body`: Instruction address of function body
- `variadic`: If true, last param collects remaining positional args as array
- `kwargs`: If true, last param collects all named args as dict
The created function captures `currentScope` as its `parentScope`.
#### CALL
**Operand**: Either:
- Number: positional argument count
- Object: `{ positional: number, named: number }`
**Stack**: [fn, arg1, arg2, ..., name1, val1, name2, val2, ...] → [returnValue]
**Behavior**:
1. Pop function from stack
2. Pop named arguments (name/value pairs) according to operand
3. Pop positional arguments according to operand
4. Mark current frame (if exists) as break target (`isBreakTarget = true`)
5. Push new call frame with current PC and scope
6. Create new scope with function's parentScope as parent
7. Bind parameters:
- For regular functions: bind params by position, then by name, then defaults, then null
- For variadic functions: bind fixed params, collect rest into array
- For kwargs functions: bind fixed params, collect named args into dict
8. Set currentScope to new scope
9. Jump to function body
**Parameter Binding Priority**:
1. Named argument (if provided)
2. Positional argument (if provided)
3. Default value (if defined)
4. Null
**Errors**: Throws if top of stack is not a function
#### TAIL_CALL
**Operand**: Same as CALL
**Effect**: Same as CALL, but reuses current call frame
**Stack**: Same as CALL
**Behavior**: Identical to CALL except:
- Does NOT push a new call frame
- Replaces currentScope instead of creating nested scope
- Enables unbounded tail recursion without stack overflow
#### RETURN
**Operand**: None
**Effect**: Return from function
**Stack**: [returnValue] → (restored stack with returnValue on top)
**Behavior**:
1. Pop return value (or null if stack empty)
2. Pop call frame
3. Restore scope from frame
4. Set PC to frame's return address
5. Push return value onto stack
**Errors**: Throws if no call frame to return from
### Array Operations
#### MAKE_ARRAY
**Operand**: Number of items (number)
**Effect**: Create array from N stack items
**Stack**: [item1, item2, ..., itemN] → [array]
Items are popped in reverse order (item1 is array[0]).
#### ARRAY_GET
**Operand**: None
**Effect**: Get array element at index
**Stack**: [array, index] → [value]
**Errors**: Throws if not array or index out of bounds
Index is coerced to number and floored.
#### ARRAY_SET
**Operand**: None
**Effect**: Set array element at index (mutates array)
**Stack**: [array, index, value] → []
**Errors**: Throws if not array or index out of bounds
#### ARRAY_LEN
**Operand**: None
**Effect**: Get array length
**Stack**: [array] → [length]
**Errors**: Throws if not array
### Dictionary Operations
#### MAKE_DICT
**Operand**: Number of key-value pairs (number)
**Effect**: Create dict from N key-value pairs
**Stack**: [key1, val1, key2, val2, ...] → [dict]
Keys are coerced to strings.
#### DICT_GET
**Operand**: None
**Effect**: Get dict value for key
**Stack**: [dict, key] → [value]
Returns null if key not found. Key is coerced to string.
**Errors**: Throws if not dict
#### DICT_SET
**Operand**: None
**Effect**: Set dict value for key (mutates dict)
**Stack**: [dict, key, value] → []
Key is coerced to string.
**Errors**: Throws if not dict
#### DICT_HAS
**Operand**: None
**Effect**: Check if key exists in dict
**Stack**: [dict, key] → [boolean]
Key is coerced to string.
**Errors**: Throws if not dict
### TypeScript Interop
#### CALL_TYPESCRIPT
**Operand**: Function name (string)
**Effect**: Call registered TypeScript function
**Stack**: [...args] → [returnValue]
**Behavior**:
1. Look up function by name in registry
2. Mark current frame (if exists) as break target
3. Await function call (TypeScript function receives arguments and returns a Value)
4. Push return value onto stack
**Notes**:
- TypeScript functions are passed the raw stack values as arguments
- They must return a valid Value
- They can be async (VM awaits them)
- Like CALL, but function is from TypeScript registry instead of stack
**Errors**: Throws if function not found
**TypeScript Function Signature**:
```typescript
type TypeScriptFunction = (...args: Value[]) => Promise<Value> | Value;
```
### Special
#### HALT
**Operand**: None
**Effect**: Stop execution
**Stack**: No change
## Common Bytecode Patterns
### If-Else Statement
```
LOAD 'x'
PUSH 5
GT
JUMP_IF_FALSE 2 # skip then block, jump to else
# then block (N instructions)
JUMP M # skip else block
# else block
```
### While Loop
```
loop_start:
# condition
JUMP_IF_FALSE N # jump past loop body
# body (N-1 instructions)
JUMP -N # jump back to loop_start
loop_end:
```
### Function Definition
```
MAKE_FUNCTION <index>
STORE 'functionName'
JUMP N # skip function body
function_body:
# function code (N instructions)
RETURN
skip_body:
```
### Try-Catch
```
PUSH_TRY N # catch is N instructions ahead
# try block
POP_TRY
JUMP M # skip catch block
catch_label:
STORE 'errorVar' # Error is on stack
# catch block
end_label:
```
### Named Function Call
```
LOAD 'mkdir'
PUSH 'src/bin' # positional arg
PUSH 'recursive' # name
PUSH true # value
CALL { positional: 1, named: 1 }
```
### Tail Recursive Function
```
MAKE_FUNCTION <factorial_def>
STORE 'factorial'
JUMP 10 # skip to main
factorial_body:
LOAD 'n'
PUSH 0
EQ
JUMP_IF_FALSE 2 # skip to recurse
LOAD 'acc'
RETURN
recurse:
LOAD 'factorial'
LOAD 'n'
PUSH 1
SUB
LOAD 'n'
LOAD 'acc'
MUL
TAIL_CALL 2 # No stack growth!
main:
LOAD 'factorial'
PUSH 5
PUSH 1
CALL 2
```
## Error Conditions
### Runtime Errors
All of these should throw errors:
1. **Undefined Variable**: LOAD of non-existent variable
2. **Type Mismatch**: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function
3. **Index Out of Bounds**: ARRAY_GET/SET with invalid index
4. **Stack Underflow**: Arithmetic ops without enough operands
5. **Uncaught Exception**: THROW with no exception handlers
6. **Break Outside Loop**: BREAK with no break target
7. **Continue Outside Loop**: CONTINUE with no continue target
8. **Return Outside Function**: RETURN with no call frame
9. **Unknown Function**: CALL_TYPESCRIPT with unregistered function
10. **Mismatched Handler**: POP_TRY with no handler
11. **Invalid Constant**: PUSH with invalid constant index
12. **Invalid Function Definition**: MAKE_FUNCTION with non-function_def constant
## Edge Cases
### Empty Stack
- Arithmetic/comparison ops on empty stack should throw
- RETURN with empty stack returns null
- HALT with empty stack returns null
### Null Values
- Arithmetic with null coerces to 0
- Comparisons with null work normally
- Null is falsy
### Scope Shadowing
- Variables in inner scopes shadow outer scopes during LOAD
- STORE updates outermost scope where variable is defined
### Function Parameter Binding
- Missing positional args → use named args → use defaults → use null
- Extra positional args → collected by variadic parameter or ignored
- Extra named args → collected by kwargs parameter or ignored
- Named arg matching is case-sensitive
### Tail Call Optimization
- TAIL_CALL reuses frame, so return address is from original caller
- Multiple tail calls in sequence never grow stack
- TAIL_CALL can call different function (not just self-recursive)
### Break/Continue Semantics
- BREAK unwinds to frame that called the iterator function
- Multiple nested function calls: break exits all of them until reaching marked frame
- CONTINUE requires explicit continueAddress in frame (set by compiler for loops)
### Exception Unwinding
- THROW unwinds call stack to handler's depth, not just to handler
- Exception handlers form a stack (nested try blocks)
- Error value on stack is available in catch block via STORE
## VM Initialization
```typescript
const vm = new VM(bytecode);
vm.registerFunction('add', (a, b) => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
const result = await vm.execute()
```
## Testing Considerations
### Unit Tests Should Cover
1. **Each opcode** individually with minimal setup
2. **Type coercion** for arithmetic, comparison, and logical ops
3. **Scope chain** resolution (local, parent, global)
4. **Call frames** (nested calls, return values)
5. **Exception handling** (nested try blocks, unwinding)
6. **Break/continue** (nested functions, iterator pattern)
7. **Closures** (capturing variables, multiple nesting levels)
8. **Tail calls** (self-recursive, mutual recursion)
9. **Parameter binding** (positional, named, defaults, variadic, kwargs, combinations)
10. **Array/dict operations** (creation, access, mutation)
11. **Error conditions** (all error cases listed above)
12. **Edge cases** (empty stack, null values, shadowing, etc.)
### Integration Tests Should Cover
1. **Recursive functions** (factorial, fibonacci)
2. **Iterator pattern** (each with break)
3. **Closure examples** (counters, adder factories)
4. **Exception examples** (try/catch/throw chains)
5. **Complex scope** (deeply nested functions)
6. **Mixed features** (variadic + defaults + kwargs)
### Property-Based Tests Should Cover
1. **Stack integrity** (stack size matches expectations after ops)
2. **Scope integrity** (variables remain accessible)
3. **Frame integrity** (call stack unwinds correctly)
## Version History
- **1.0** (2024): Initial specification
## Notes
- PC increment happens after each instruction execution
- Jump instructions use relative offsets (added to current PC after increment)
- All async operations (TypeScript functions) must be awaited
- Arrays and dicts are mutable (pass by reference)
- Functions are immutable values
- The VM is single-threaded (no concurrency primitives)