diff --git a/README.md b/README.md index 20617be..82794d6 100644 --- a/README.md +++ b/README.md @@ -5,4 +5,56 @@ It's where Shrimp live. ## Quickstart bun install - bun test \ No newline at end of file + bun test + +## TODO (tests) + +- [ ] PUSH +- [ ] POP +- [ ] DUP + +- [ ] LOAD +- [ ] STORE + +- [x] ADD +- [x] SUB +- [x] MUL +- [x] DIV +- [ ] MOD +- [ ] EQ +- [ ] NEQ +- [ ] LT +- [ ] GT +- [ ] LTE +- [ ] GTE +- [ ] AND +- [ ] OR +- [ ] NOT + +- [ ] JUMP +- [ ] JUMP_IF_FALSE +- [ ] JUMP_IF_TRUE +- [ ] BREAK +- [ ] CONTINUE + +- [ ] PUSH_TRY +- [ ] POP_TRY +- [ ] THROW + +- [ ] MAKE_FUNCTION +- [ ] CALL +- [ ] TAIL_CALL +- [ ] CALL_TYPESCRIPT +- [ ] RETURN + +- [ ] MAKE_ARRAY +- [ ] ARRAY_GET +- [ ] ARRAY_SET +- [ ] ARRAY_LEN + +- [ ] MAKE_DICT +- [ ] DICT_GET +- [ ] DICT_SET +- [ ] DICT_HAS + +- [ ] HALT \ No newline at end of file diff --git a/SPEC.md b/SPEC.md new file mode 100644 index 0000000..63b5c39 --- /dev/null +++ b/SPEC.md @@ -0,0 +1,631 @@ +# ReefVM Specification + +Version 1.0 + +## Overview + +The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp programming language. It supports closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue. + +## Architecture + +### Components + +- **Value Stack**: Operand stack for computation +- **Call Stack**: Call frames for function invocations +- **Exception Handlers**: Stack of try/catch handlers +- **Scope Chain**: Linked scopes for lexical variable resolution +- **Program Counter (PC)**: Current instruction index +- **Constants Pool**: Immutable values and function metadata +- **TypeScript Function Registry**: External functions callable from Shrimp + +### Execution Model + +1. VM loads bytecode with instructions and constants +2. PC starts at instruction 0 +3. Each instruction is executed sequentially (unless jumps occur) +4. Execution continues until HALT or end of instructions +5. Final value is top of stack (or null if empty) + +## Value Types + +All runtime values are tagged unions: + +```typescript +type Value = + | { type: 'null', value: null } + | { type: 'boolean', value: boolean } + | { type: 'number', value: number } + | { type: 'string', value: string } + | { type: 'array', items: Value[] } + | { type: 'dict', entries: Map } + | { type: 'function', params: string[], defaults: Record, + body: number, scope: Scope, variadic: boolean, kwargs: boolean } +``` + +### Type Coercion + +**toNumber**: number → identity, string → parseFloat (or 0), boolean → 1/0, others → 0 + +**toString**: string → identity, number → string, boolean → string, null → "null", +function → "", array → "[item, item]", dict → "{key: value, ...}" + +**isTruthy**: boolean → value, number → value !== 0, string → value !== "", +null → false, array → length > 0, dict → size > 0, others → true + +## Bytecode Format + +```typescript +type Bytecode = { + instructions: Instruction[] + constants: Constant[] +} + +type Instruction = { + op: OpCode + operand?: number | string | { positional: number; named: number } +} + +type Constant = + | Value + | { type: 'function_def', params: string[], defaults: Record, + body: number, variadic: boolean, kwargs: boolean } +``` + +## Scope Chain + +Variables are resolved through a linked scope chain: + +```typescript +class Scope { + locals: Map; + parent?: Scope; +} +``` + +**Variable Resolution (LOAD)**: +1. Check current scope's locals +2. If not found, recursively check parent +3. If not found anywhere, throw error + +**Variable Assignment (STORE)**: +1. If variable exists in current scope, update it +2. Else if variable exists in any parent scope, update it there +3. Else create new variable in current scope + +This implements "assign to outermost scope where defined" semantics. + +## Call Frames + +```typescript +type CallFrame = { + returnAddress: number // Where to resume after RETURN + returnScope: Scope // Scope to restore after RETURN + isBreakTarget: boolean // Can be targeted by BREAK + continueAddress?: number // Where to jump for CONTINUE +} +``` + +## Exception Handlers + +```typescript +type ExceptionHandler = { + catchAddress: number // Where to jump on exception + callStackDepth: number // Call stack depth when handler pushed + scope: Scope // Scope to restore in catch block +} +``` + +## Opcodes + +### Stack Operations + +#### PUSH +**Operand**: Index into constants pool (number) +**Effect**: Push constant onto stack +**Stack**: [] → [value] + +#### POP +**Operand**: None +**Effect**: Discard top of stack +**Stack**: [value] → [] + +#### DUP +**Operand**: None +**Effect**: Duplicate top of stack +**Stack**: [value] → [value, value] + +### Variable Operations + +#### LOAD +**Operand**: Variable name (string) +**Effect**: Push variable value onto stack +**Stack**: [] → [value] +**Errors**: Throws if variable not found in scope chain + +#### STORE +**Operand**: Variable name (string) +**Effect**: Store top of stack into variable (following scope chain rules) +**Stack**: [value] → [] + +### Arithmetic Operations + +All arithmetic operations pop two values, perform operation, push result as number. + +#### ADD +**Stack**: [a, b] → [a + b] +**Note**: Only for numbers (use separate string concat if needed) + +#### SUB +**Stack**: [a, b] → [a - b] + +#### MUL +**Stack**: [a, b] → [a * b] + +#### DIV +**Stack**: [a, b] → [a / b] + +#### MOD +**Stack**: [a, b] → [a % b] + +### Comparison Operations + +All comparison operations pop two values, compare, push boolean (as number 1/0). + +#### EQ +**Stack**: [a, b] → [a == b ? 1 : 0] +**Note**: Type-aware equality + +#### NEQ +**Stack**: [a, b] → [a != b ? 1 : 0] + +#### LT +**Stack**: [a, b] → [a < b ? 1 : 0] + +#### GT +**Stack**: [a, b] → [a > b ? 1 : 0] + +#### LTE +**Stack**: [a, b] → [a <= b ? 1 : 0] + +#### GTE +**Stack**: [a, b] → [a >= b ? 1 : 0] + +### Logical Operations + +#### AND +**Stack**: [a, b] → [isTruthy(a) && isTruthy(b) ? 1 : 0] + +#### OR +**Stack**: [a, b] → [isTruthy(a) || isTruthy(b) ? 1 : 0] + +#### NOT +**Stack**: [a] → [!isTruthy(a)] + +### Control Flow + +#### JUMP +**Operand**: Instruction address (number) +**Effect**: Set PC to address +**Stack**: No change + +#### JUMP_IF_FALSE +**Operand**: Instruction address (number) +**Effect**: If top of stack is falsy, jump to address +**Stack**: [condition] → [] + +#### JUMP_IF_TRUE +**Operand**: Instruction address (number) +**Effect**: If top of stack is truthy, jump to address +**Stack**: [condition] → [] + +#### BREAK +**Operand**: None +**Effect**: Unwind call stack until frame with `isBreakTarget = true`, resume there +**Stack**: No change +**Errors**: Throws if no break target found + +**Behavior**: +1. Pop frames from call stack +2. For each frame, restore its returnScope and returnAddress +3. Stop when finding frame with `isBreakTarget = true` +4. Resume execution at that frame's return address + +#### CONTINUE +**Operand**: None +**Effect**: Unwind to nearest frame with `continueAddress`, jump there +**Stack**: No change +**Errors**: Throws if no continue target found + +**Behavior**: +1. Search call stack (without popping) for frame with `continueAddress` +2. When found, restore scope and jump to `continueAddress` +3. Pop all frames above the continue target + +### Exception Handling + +#### PUSH_TRY +**Operand**: Catch block address (number) +**Effect**: Push exception handler +**Stack**: No change + +Registers a try block. If THROW occurs before POP_TRY, execution jumps to catch address. + +#### POP_TRY +**Operand**: None +**Effect**: Pop exception handler (try block completed without exception) +**Stack**: No change +**Errors**: Throws if no handler to pop + +#### THROW +**Operand**: None +**Effect**: Throw exception with error value from stack +**Stack**: [errorValue] → (unwound) + +**Behavior**: +1. Pop error value from stack +2. If no exception handlers, throw JavaScript Error with error message +3. Otherwise, pop most recent exception handler +4. Unwind call stack to handler's depth +5. Restore handler's scope +6. Push error value back onto stack +7. Jump to handler's catch address + +### Function Operations + +#### MAKE_FUNCTION +**Operand**: Index into constants pool (number) +**Effect**: Create function value, capturing current scope +**Stack**: [] → [function] + +The constant must be a `function_def` with: +- `params`: Parameter names +- `defaults`: Map of param names to constant indices for default values +- `body`: Instruction address of function body +- `variadic`: If true, last param collects remaining positional args as array +- `kwargs`: If true, last param collects all named args as dict + +The created function captures `currentScope` as its `parentScope`. + +#### CALL +**Operand**: Either: +- Number: positional argument count +- Object: `{ positional: number, named: number }` + +**Stack**: [fn, arg1, arg2, ..., name1, val1, name2, val2, ...] → [returnValue] + +**Behavior**: +1. Pop function from stack +2. Pop named arguments (name/value pairs) according to operand +3. Pop positional arguments according to operand +4. Mark current frame (if exists) as break target (`isBreakTarget = true`) +5. Push new call frame with current PC and scope +6. Create new scope with function's parentScope as parent +7. Bind parameters: + - For regular functions: bind params by position, then by name, then defaults, then null + - For variadic functions: bind fixed params, collect rest into array + - For kwargs functions: bind fixed params, collect named args into dict +8. Set currentScope to new scope +9. Jump to function body + +**Parameter Binding Priority**: +1. Named argument (if provided) +2. Positional argument (if provided) +3. Default value (if defined) +4. Null + +**Errors**: Throws if top of stack is not a function + +#### TAIL_CALL +**Operand**: Same as CALL +**Effect**: Same as CALL, but reuses current call frame +**Stack**: Same as CALL + +**Behavior**: Identical to CALL except: +- Does NOT push a new call frame +- Replaces currentScope instead of creating nested scope +- Enables unbounded tail recursion without stack overflow + +#### RETURN +**Operand**: None +**Effect**: Return from function +**Stack**: [returnValue] → (restored stack with returnValue on top) + +**Behavior**: +1. Pop return value (or null if stack empty) +2. Pop call frame +3. Restore scope from frame +4. Set PC to frame's return address +5. Push return value onto stack + +**Errors**: Throws if no call frame to return from + +### Array Operations + +#### MAKE_ARRAY +**Operand**: Number of items (number) +**Effect**: Create array from N stack items +**Stack**: [item1, item2, ..., itemN] → [array] + +Items are popped in reverse order (item1 is array[0]). + +#### ARRAY_GET +**Operand**: None +**Effect**: Get array element at index +**Stack**: [array, index] → [value] +**Errors**: Throws if not array or index out of bounds + +Index is coerced to number and floored. + +#### ARRAY_SET +**Operand**: None +**Effect**: Set array element at index (mutates array) +**Stack**: [array, index, value] → [] +**Errors**: Throws if not array or index out of bounds + +#### ARRAY_LEN +**Operand**: None +**Effect**: Get array length +**Stack**: [array] → [length] +**Errors**: Throws if not array + +### Dictionary Operations + +#### MAKE_DICT +**Operand**: Number of key-value pairs (number) +**Effect**: Create dict from N key-value pairs +**Stack**: [key1, val1, key2, val2, ...] → [dict] + +Keys are coerced to strings. + +#### DICT_GET +**Operand**: None +**Effect**: Get dict value for key +**Stack**: [dict, key] → [value] + +Returns null if key not found. Key is coerced to string. +**Errors**: Throws if not dict + +#### DICT_SET +**Operand**: None +**Effect**: Set dict value for key (mutates dict) +**Stack**: [dict, key, value] → [] + +Key is coerced to string. +**Errors**: Throws if not dict + +#### DICT_HAS +**Operand**: None +**Effect**: Check if key exists in dict +**Stack**: [dict, key] → [boolean] + +Key is coerced to string. +**Errors**: Throws if not dict + +### TypeScript Interop + +#### CALL_TYPESCRIPT +**Operand**: Function name (string) +**Effect**: Call registered TypeScript function +**Stack**: [...args] → [returnValue] + +**Behavior**: +1. Look up function by name in registry +2. Mark current frame (if exists) as break target +3. Await function call (TypeScript function receives arguments and returns a Value) +4. Push return value onto stack + +**Notes**: +- TypeScript functions are passed the raw stack values as arguments +- They must return a valid Value +- They can be async (VM awaits them) +- Like CALL, but function is from TypeScript registry instead of stack + +**Errors**: Throws if function not found + +**TypeScript Function Signature**: +```typescript +type TypeScriptFunction = (...args: Value[]) => Promise | Value; +``` + +### Special + +#### HALT +**Operand**: None +**Effect**: Stop execution +**Stack**: No change + +## Common Bytecode Patterns + +### If-Else Statement +``` +LOAD 'x' +PUSH 5 +GT +JUMP_IF_FALSE else_label + # then block + JUMP end_label +else_label: + # else block +end_label: +``` + +### While Loop +``` +loop_start: + # condition + JUMP_IF_FALSE loop_end + # body + JUMP loop_start +loop_end: +``` + +### Function Definition +``` +MAKE_FUNCTION +STORE 'functionName' +JUMP skip_body +function_body: + # function code + RETURN +skip_body: +``` + +### Try-Catch +``` +PUSH_TRY catch_label + # try block +POP_TRY +JUMP end_label +catch_label: + STORE 'errorVar' # Error is on stack + # catch block +end_label: +``` + +### Named Function Call +``` +LOAD 'mkdir' +PUSH 'src/bin' # positional arg +PUSH 'recursive' # name +PUSH true # value +CALL { positional: 1, named: 1 } +``` + +### Tail Recursive Function +``` +MAKE_FUNCTION +STORE 'factorial' +JUMP main +factorial_body: + LOAD 'n' + PUSH 0 + EQ + JUMP_IF_FALSE recurse + LOAD 'acc' + RETURN +recurse: + LOAD 'factorial' + LOAD 'n' + PUSH 1 + SUB + LOAD 'n' + LOAD 'acc' + MUL + TAIL_CALL 2 # No stack growth! +main: + LOAD 'factorial' + PUSH 5 + PUSH 1 + CALL 2 +``` + +## Error Conditions + +### Runtime Errors + +All of these should throw errors: + +1. **Undefined Variable**: LOAD of non-existent variable +2. **Type Mismatch**: ARRAY_GET on non-array, DICT_GET on non-dict, CALL on non-function +3. **Index Out of Bounds**: ARRAY_GET/SET with invalid index +4. **Stack Underflow**: Arithmetic ops without enough operands +5. **Uncaught Exception**: THROW with no exception handlers +6. **Break Outside Loop**: BREAK with no break target +7. **Continue Outside Loop**: CONTINUE with no continue target +8. **Return Outside Function**: RETURN with no call frame +9. **Unknown Function**: CALL_TYPESCRIPT with unregistered function +10. **Mismatched Handler**: POP_TRY with no handler +11. **Invalid Constant**: PUSH with invalid constant index +12. **Invalid Function Definition**: MAKE_FUNCTION with non-function_def constant + +## Edge Cases + +### Empty Stack +- Arithmetic/comparison ops on empty stack should throw +- RETURN with empty stack returns null +- HALT with empty stack returns null + +### Null Values +- Arithmetic with null coerces to 0 +- Comparisons with null work normally +- Null is falsy + +### Scope Shadowing +- Variables in inner scopes shadow outer scopes during LOAD +- STORE updates outermost scope where variable is defined + +### Function Parameter Binding +- Missing positional args → use named args → use defaults → use null +- Extra positional args → collected by variadic parameter or ignored +- Extra named args → collected by kwargs parameter or ignored +- Named arg matching is case-sensitive + +### Tail Call Optimization +- TAIL_CALL reuses frame, so return address is from original caller +- Multiple tail calls in sequence never grow stack +- TAIL_CALL can call different function (not just self-recursive) + +### Break/Continue Semantics +- BREAK unwinds to frame that called the iterator function +- Multiple nested function calls: break exits all of them until reaching marked frame +- CONTINUE requires explicit continueAddress in frame (set by compiler for loops) + +### Exception Unwinding +- THROW unwinds call stack to handler's depth, not just to handler +- Exception handlers form a stack (nested try blocks) +- Error value on stack is available in catch block via STORE + +## VM Initialization + +```typescript +const vm = new VM(bytecode); +vm.registerFunction('add', (a, b) => { + return { type: 'number', value: toNumber(a) + toNumber(b) } +}) +const result = await vm.execute() +``` + +## Testing Considerations + +### Unit Tests Should Cover + +1. **Each opcode** individually with minimal setup +2. **Type coercion** for arithmetic, comparison, and logical ops +3. **Scope chain** resolution (local, parent, global) +4. **Call frames** (nested calls, return values) +5. **Exception handling** (nested try blocks, unwinding) +6. **Break/continue** (nested functions, iterator pattern) +7. **Closures** (capturing variables, multiple nesting levels) +8. **Tail calls** (self-recursive, mutual recursion) +9. **Parameter binding** (positional, named, defaults, variadic, kwargs, combinations) +10. **Array/dict operations** (creation, access, mutation) +11. **Error conditions** (all error cases listed above) +12. **Edge cases** (empty stack, null values, shadowing, etc.) + +### Integration Tests Should Cover + +1. **Recursive functions** (factorial, fibonacci) +2. **Iterator pattern** (each with break) +3. **Closure examples** (counters, adder factories) +4. **Exception examples** (try/catch/throw chains) +5. **Complex scope** (deeply nested functions) +6. **Mixed features** (variadic + defaults + kwargs) + +### Property-Based Tests Should Cover + +1. **Stack integrity** (stack size matches expectations after ops) +2. **Scope integrity** (variables remain accessible) +3. **Frame integrity** (call stack unwinds correctly) + +## Version History + +- **1.0** (2024): Initial specification + +## Notes + +- PC increment happens after each instruction execution +- Jump instructions compensate for automatic PC increment (subtract 1) +- All async operations (TypeScript functions) must be awaited +- Arrays and dicts are mutable (pass by reference) +- Functions are immutable values +- The VM is single-threaded (no concurrency primitives) \ No newline at end of file diff --git a/tests/basic.test.ts b/tests/basic.test.ts index 2753aaf..e48b9fa 100644 --- a/tests/basic.test.ts +++ b/tests/basic.test.ts @@ -43,5 +43,12 @@ test("dividing numbers", async () => { DIV ` expect(await run(toBytecode(str))).toEqual({ type: 'number', value: 5 }) + + const str2 = ` + PUSH 10 + PUSH 0 + DIV +` + expect(await run(toBytecode(str2))).toEqual({ type: 'number', value: Infinity }) })