forked from defunkt/ReefVM
500 lines
16 KiB
Markdown
500 lines
16 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
ReefVM is a stack-based bytecode virtual machine for the Shrimp programming language. It implements a complete VM with closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.
|
|
|
|
**Essential reading**: Before making changes, read README.md, SPEC.md, and GUIDE.md to understand the VM architecture, instruction set, and compiler patterns.
|
|
|
|
## Development Commands
|
|
|
|
### Running Files
|
|
```bash
|
|
bun <file.ts> # Run TypeScript files directly
|
|
bun examples/native.ts # Run example
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
bun test # Run all tests
|
|
bun test <file> # Run specific test file
|
|
bun test --watch # Watch mode
|
|
```
|
|
|
|
### Tools
|
|
```bash
|
|
./bin/reef <file.reef> # Execute bytecode file
|
|
./bin/validate <file.reef> # Validate bytecode
|
|
./bin/debug <file.reef> # Step-by-step debugger
|
|
./bin/repl # Interactive REPL
|
|
```
|
|
|
|
### Building
|
|
No build step required - Bun runs TypeScript directly.
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
**VM Execution Model** (src/vm.ts):
|
|
- Stack-based execution with program counter (PC)
|
|
- Call stack for function frames
|
|
- Exception handler stack for try/catch/finally
|
|
- Lexical scope chain with parent references (includes native functions)
|
|
|
|
**Key subsystems**:
|
|
- **bytecode.ts**: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The `toBytecode()` function accepts either a string (human-readable) or typed array format (programmatic).
|
|
- **value.ts**: Tagged union Value type system with type coercion functions (toNumber, toString, isTrue, isEqual)
|
|
- **scope.ts**: Linked scope chain for variable resolution with lexical scoping
|
|
- **frame.ts**: Call frame tracking for function calls and break targets
|
|
- **exception.ts**: Exception handler records for try/catch/finally blocks
|
|
- **validator.ts**: Bytecode validation to catch common errors before execution
|
|
- **opcode.ts**: OpCode enum defining all VM instructions
|
|
|
|
### Critical Design Decisions
|
|
|
|
**Label-based jumps**: All JUMP instructions (`JUMP`, `JUMP_IF_FALSE`, `JUMP_IF_TRUE`) require label operands (`.label`), not numeric offsets. Labels are resolved to PC-relative offsets during compilation, making bytecode position-independent. PUSH_TRY/PUSH_FINALLY use absolute addresses and can accept either labels or numeric offsets.
|
|
|
|
**Truthiness semantics**: Only `null` and `false` are falsy. Unlike JavaScript, `0`, `""`, empty arrays, and empty dicts are truthy.
|
|
|
|
**No AND/OR opcodes**: Short-circuit logical operations are implemented at the compiler level using JUMP patterns with DUP.
|
|
|
|
**Tail call optimization**: TAIL_CALL reuses the current call frame instead of pushing a new one, enabling unbounded recursion.
|
|
|
|
**Break semantics**: CALL marks frames as break targets. BREAK unwinds the call stack to the most recent break target, enabling Ruby-style iterator patterns.
|
|
|
|
**Exception handling**: THROW jumps to finally (if present) or catch. The VM does NOT auto-jump to finally on successful try completion - compilers must explicitly generate JUMPs to finally blocks.
|
|
|
|
**Parameter binding priority**: Named args bind to fixed params first. Unmatched named args go to `@named` dict parameter. Fixed params bind in order: named arg > positional arg > default > null.
|
|
|
|
**Native function calling**: Native functions are stored in scope and called via LOAD + CALL, using the same calling convention as Reef functions. Named arguments are supported by extracting parameter names from the function signature at call time.
|
|
|
|
## Testing Strategy
|
|
|
|
Tests are organized by feature area:
|
|
- **opcodes.test.ts**: Stack ops, arithmetic, comparisons, variables, control flow
|
|
- **functions.test.ts**: Function creation, calls, closures, defaults, variadic, named args
|
|
- **tail-call.test.ts**: Tail call optimization and unbounded recursion
|
|
- **exceptions.test.ts**: Try/catch/finally, exception unwinding, nested handlers
|
|
- **native.test.ts**: Native function interop (sync and async)
|
|
- **functions-parameter.test.ts**: Convenience parameter for passing functions to run() and VM
|
|
- **bytecode.test.ts**: Bytecode string parser, label resolution, constants
|
|
- **programmatic.test.ts**: Array format API, typed tuples, labels, functions
|
|
- **validator.test.ts**: Bytecode validation rules
|
|
- **unicode.test.ts**: Unicode and emoji identifiers
|
|
- **regex.test.ts**: RegExp support
|
|
- **examples.test.ts**: Integration tests for example programs
|
|
|
|
When adding features:
|
|
1. Add unit tests for the specific opcode/feature
|
|
2. Add integration tests showing real-world usage
|
|
3. Update SPEC.md with formal specification
|
|
4. Update GUIDE.md with compiler patterns
|
|
5. Consider adding an example to examples/
|
|
|
|
## Common Patterns
|
|
|
|
### Writing Bytecode Tests
|
|
|
|
ReefVM supports two bytecode formats: string and array.
|
|
|
|
**String format** (human-readable):
|
|
```typescript
|
|
import { toBytecode, run } from "#reef"
|
|
|
|
const bytecode = toBytecode(`
|
|
PUSH 42
|
|
STORE x
|
|
LOAD x
|
|
HALT
|
|
`)
|
|
|
|
const result = await run(bytecode)
|
|
// result is { type: 'number', value: 42 }
|
|
```
|
|
|
|
**Array format** (programmatic, type-safe):
|
|
```typescript
|
|
import { toBytecode, run } from "#reef"
|
|
|
|
const bytecode = toBytecode([
|
|
["PUSH", 42],
|
|
["STORE", "x"],
|
|
["LOAD", "x"],
|
|
["HALT"]
|
|
])
|
|
|
|
const result = await run(bytecode)
|
|
// result is { type: 'number', value: 42 }
|
|
```
|
|
|
|
Array format features:
|
|
- Typed tuples for compile-time type checking
|
|
- Labels defined as `[".label:"]` (single-element arrays with colon suffix)
|
|
- Label references as strings: `["JUMP", ".label"]` (no colon in references)
|
|
- Function params as string arrays: `["MAKE_FUNCTION", ["x", "y=10"], ".body"]`
|
|
- See `tests/programmatic.test.ts` and `examples/programmatic.ts` for examples
|
|
|
|
### Native Function Registration and Global Values
|
|
|
|
**Option 1**: Pass to `run()` or `VM` constructor (convenience)
|
|
```typescript
|
|
const result = await run(bytecode, {
|
|
add: (a: number, b: number) => a + b,
|
|
greet: (name: string) => `Hello, ${name}!`,
|
|
pi: 3.14159,
|
|
config: { debug: true, port: 8080 }
|
|
})
|
|
|
|
// Or with VM constructor
|
|
const vm = new VM(bytecode, { add, greet, pi, config })
|
|
```
|
|
|
|
**Option 2**: Set values with `vm.set()` (manual)
|
|
```typescript
|
|
const vm = new VM(bytecode)
|
|
|
|
// Set functions (auto-wrapped to native functions)
|
|
vm.set('add', (a: number, b: number) => a + b)
|
|
|
|
// Set any other values (auto-converted to ReefVM Values)
|
|
vm.set('pi', 3.14159)
|
|
vm.set('config', { debug: true, port: 8080 })
|
|
|
|
await vm.run()
|
|
```
|
|
|
|
**Option 3**: Set Value-based functions with `vm.setValueFunction()` (advanced)
|
|
|
|
For functions that work directly with ReefVM Value types:
|
|
|
|
```typescript
|
|
const vm = new VM(bytecode)
|
|
|
|
// Set Value-based function (no wrapping, works directly with Values)
|
|
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
|
|
return toValue(toNumber(a) + toNumber(b))
|
|
})
|
|
|
|
await vm.run()
|
|
```
|
|
|
|
Auto-wrapping handles:
|
|
- Functions: wrapped as native functions with Value ↔ native type conversion
|
|
- Sync and async functions
|
|
- Arrays, objects, primitives, null, RegExp
|
|
- All values converted via `toValue()`
|
|
|
|
### Calling Functions from TypeScript
|
|
|
|
Use `vm.call()` to invoke Reef or native functions from TypeScript:
|
|
|
|
```typescript
|
|
const bytecode = toBytecode(`
|
|
MAKE_FUNCTION (x y=10) .add
|
|
STORE add
|
|
HALT
|
|
|
|
.add:
|
|
LOAD x
|
|
LOAD y
|
|
ADD
|
|
RETURN
|
|
`)
|
|
|
|
const vm = new VM(bytecode, {
|
|
log: (msg: string) => console.log(msg) // Native function
|
|
})
|
|
await vm.run()
|
|
|
|
// Call Reef function with positional arguments
|
|
const result1 = await vm.call('add', 5, 3) // → 8
|
|
|
|
// Call Reef function with named arguments (pass final object)
|
|
const result2 = await vm.call('add', 5, { y: 20 }) // → 25
|
|
|
|
// Call Reef function with all named arguments
|
|
const result3 = await vm.call('add', { x: 10, y: 15 }) // → 25
|
|
|
|
// Call native function
|
|
await vm.call('log', 'Hello!')
|
|
```
|
|
|
|
**How it works**:
|
|
- Looks up function (Reef or native) in VM scope
|
|
- For Reef functions: converts to callable JavaScript function using `fnFromValue`
|
|
- For native functions: calls directly
|
|
- Automatically converts arguments to ReefVM Values
|
|
- Converts result back to JavaScript types
|
|
|
|
### Label Usage (Required for JUMP instructions)
|
|
All JUMP instructions must use labels:
|
|
```
|
|
JUMP .skip
|
|
PUSH 42
|
|
HALT
|
|
.skip:
|
|
PUSH 99
|
|
HALT
|
|
```
|
|
|
|
### Function Definition Patterns
|
|
|
|
When defining functions, you MUST prevent the PC from falling through into function bodies. Two patterns:
|
|
|
|
**Pattern 1: JUMP over function bodies (Recommended)**
|
|
```
|
|
MAKE_FUNCTION (params) .body
|
|
STORE function_name
|
|
JUMP .end ; Skip over function body
|
|
.body:
|
|
<function code>
|
|
RETURN
|
|
.end:
|
|
<continue with program>
|
|
```
|
|
|
|
**Pattern 2: Function bodies after HALT**
|
|
```
|
|
MAKE_FUNCTION (params) .body
|
|
STORE function_name
|
|
<use the function>
|
|
HALT ; Stop before function bodies
|
|
.body:
|
|
<function code>
|
|
RETURN
|
|
```
|
|
|
|
Pattern 1 is required for:
|
|
- Defining multiple functions before using them
|
|
- REPL mode
|
|
- Any case where execution continues after defining a function
|
|
|
|
Pattern 2 only works if you HALT before reaching function bodies.
|
|
|
|
### REPL Mode (Incremental Execution)
|
|
|
|
For building REPLs (like the Shrimp REPL), use `vm.continue()` and `vm.appendBytecode()`:
|
|
|
|
```typescript
|
|
const vm = new VM(toBytecode([]), natives)
|
|
await vm.run() // Initialize (empty bytecode)
|
|
|
|
// User enters: x = 42
|
|
const line1 = compileLine("x = 42") // No HALT!
|
|
vm.appendBytecode(line1)
|
|
await vm.continue() // Execute only line 1
|
|
|
|
// User enters: x + 10
|
|
const line2 = compileLine("x + 10") // No HALT!
|
|
vm.appendBytecode(line2)
|
|
await vm.continue() // Execute only line 2, result is 52
|
|
```
|
|
|
|
**Key points**:
|
|
- `vm.run()` resets PC to 0 (re-executes everything) - use for initial setup only
|
|
- `vm.continue()` resumes from current PC (executes only new bytecode)
|
|
- `vm.appendBytecode(bytecode)` properly handles constant index remapping
|
|
- Don't use HALT in REPL lines - let VM stop naturally
|
|
- Scope and variables persist across all lines
|
|
- Side effects only run once
|
|
|
|
## TypeScript Configuration
|
|
|
|
- Import alias: `#reef` maps to `./src/index.ts`
|
|
- Module system: ES modules (`"type": "module"` in package.json)
|
|
- Bun automatically handles TypeScript compilation
|
|
|
|
## Bun-Specific Notes
|
|
|
|
- Use `bun` instead of `node`, `npm`, `pnpm`, or `vite`
|
|
- No need for dotenv - Bun loads .env automatically
|
|
- Prefer Bun APIs over Node.js equivalents when available
|
|
- See .cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc for detailed Bun usage
|
|
|
|
## Adding a New OpCode
|
|
|
|
When adding a new instruction to ReefVM, you must update multiple files in a specific order. Follow this checklist:
|
|
|
|
### 1. Define the OpCode (src/opcode.ts)
|
|
|
|
Add the new opcode to the `OpCode` enum with comprehensive documentation:
|
|
|
|
```typescript
|
|
export enum OpCode {
|
|
// ... existing opcodes
|
|
|
|
MY_NEW_OP, // operand: <type> | stack: [inputs] → [outputs]
|
|
// Description of what it does
|
|
// Any important behavioral notes
|
|
}
|
|
```
|
|
|
|
### 2. Implement VM Execution (src/vm.ts)
|
|
|
|
Add a case to the `execute()` method's switch statement:
|
|
|
|
```typescript
|
|
async execute(instruction: Instruction) {
|
|
switch (instruction.op) {
|
|
// ... existing cases
|
|
|
|
case OpCode.MY_NEW_OP:
|
|
// Implementation
|
|
// - Pop values from this.stack as needed
|
|
// - Perform the operation
|
|
// - Push results to this.stack
|
|
// - Throw errors for invalid operations
|
|
// - Use await for async operations
|
|
break
|
|
}
|
|
}
|
|
```
|
|
|
|
Common helper methods:
|
|
- `this.binaryOp((a, b) => ...)` - For binary arithmetic/comparison
|
|
- `toNumber(value)`, `toString(value)`, `isTrue(value)`, `isEqual(a, b)` - Type coercion
|
|
- `this.scope.get(name)`, `this.scope.set(name, value)` - Variable access
|
|
|
|
### 3. Update Validator (src/validator.ts)
|
|
|
|
Add the opcode to the appropriate set:
|
|
|
|
```typescript
|
|
// If your opcode requires an operand:
|
|
const OPCODES_WITH_OPERANDS = new Set([
|
|
// ... existing
|
|
OpCode.MY_NEW_OP,
|
|
])
|
|
|
|
// If your opcode takes no operand:
|
|
const OPCODES_WITHOUT_OPERANDS = new Set([
|
|
// ... existing
|
|
OpCode.MY_NEW_OP,
|
|
])
|
|
```
|
|
|
|
If your opcode has complex operand validation, add a specific check in the validation loop around line 154.
|
|
|
|
### 4. Update Array API (src/bytecode.ts)
|
|
|
|
Add your instruction to the `InstructionTuple` type:
|
|
|
|
```typescript
|
|
type InstructionTuple =
|
|
// ... existing types
|
|
| ["MY_NEW_OP"] // No operand
|
|
| ["MY_NEW_OP", string] // String operand
|
|
| ["MY_NEW_OP", number] // Number operand
|
|
| ["MY_NEW_OP", string, number] // Multiple operands
|
|
```
|
|
|
|
If your opcode has special operand handling, add a case in `toBytecodeFromArray()` around line 241.
|
|
|
|
### 5. Write Tests (REQUIRED)
|
|
|
|
Create tests in the appropriate test file:
|
|
|
|
```typescript
|
|
// tests/basic.test.ts, tests/functions.test.ts, etc.
|
|
|
|
test("MY_NEW_OP description", async () => {
|
|
const bytecode = toBytecode([
|
|
// Setup
|
|
["PUSH", 42],
|
|
["MY_NEW_OP"],
|
|
["HALT"]
|
|
])
|
|
|
|
const result = await run(bytecode)
|
|
expect(result).toEqual({ type: "number", value: 42 })
|
|
})
|
|
|
|
// Test edge cases
|
|
test("MY_NEW_OP with invalid input", async () => {
|
|
// Test error conditions
|
|
await expect(run(bytecode)).rejects.toThrow()
|
|
})
|
|
```
|
|
|
|
**ALWAYS write tests.** Test both success cases and error conditions. Add integration tests showing real-world usage.
|
|
|
|
### 6. Document Specification (SPEC.md)
|
|
|
|
Add a formal specification entry:
|
|
|
|
```markdown
|
|
#### MY_NEW_OP
|
|
|
|
**Operand**: `<type>`
|
|
**Stack**: `[input] → [output]`
|
|
|
|
Description of what the instruction does.
|
|
|
|
**Behavior**:
|
|
- Specific behavior point 1
|
|
- Specific behavior point 2
|
|
|
|
**Errors**:
|
|
- Error condition 1
|
|
- Error condition 2
|
|
```
|
|
|
|
### 7. Update Compiler Guide (GUIDE.md)
|
|
|
|
If your opcode introduces new patterns, add examples to GUIDE.md:
|
|
|
|
```markdown
|
|
### New Pattern Name
|
|
|
|
\```
|
|
PUSH value
|
|
MY_NEW_OP
|
|
STORE result
|
|
\```
|
|
|
|
Description of the pattern and when to use it.
|
|
```
|
|
|
|
### 8. Add Examples (Optional)
|
|
|
|
If your opcode enables new functionality, add an example to `examples/`:
|
|
|
|
```typescript
|
|
// examples/my_feature.reef or examples/my_feature.ts
|
|
const example = toBytecode([
|
|
// Demonstrate the new opcode
|
|
])
|
|
```
|
|
|
|
### Checklist Summary
|
|
|
|
When adding an opcode, update in this order:
|
|
|
|
- [ ] `src/opcode.ts` - Add enum value with docs
|
|
- [ ] `src/vm.ts` - Implement execution logic
|
|
- [ ] `src/validator.ts` - Add to operand requirement set
|
|
- [ ] `src/bytecode.ts` - Add to InstructionTuple type
|
|
- [ ] `tests/*.test.ts` - Write comprehensive tests (**REQUIRED**)
|
|
- [ ] `SPEC.md` - Document formal specification
|
|
- [ ] `GUIDE.md` - Add compiler patterns (if applicable)
|
|
- [ ] `examples/` - Add example code (if applicable)
|
|
|
|
Run `bun test` to verify all tests pass before committing.
|
|
|
|
## Common Gotchas
|
|
|
|
**Label requirements**: JUMP/JUMP_IF_FALSE/JUMP_IF_TRUE require label operands (`.label`), not numeric offsets. The bytecode compiler resolves labels to PC-relative offsets internally. PUSH_TRY/PUSH_FINALLY can use either labels or absolute instruction indices (`#N`).
|
|
|
|
**Stack operations**: Most binary operations pop in reverse order (second operand is popped first, then first operand).
|
|
|
|
**MAKE_ARRAY operand**: Specifies count, not a stack index. `MAKE_ARRAY #3` pops 3 items.
|
|
|
|
**Finally blocks**: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.
|
|
|
|
**Variable scoping**: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.
|
|
|
|
**Identifiers**: Variable and parameter names support Unicode and emoji! Valid: `💎`, `🌟`, `変数`, `counter`. Invalid: cannot start with digits or special prefixes (`.`, `#`, `@`, `...`), cannot contain whitespace or syntax characters.
|