ReefVM/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

ReefVM is a stack-based bytecode virtual machine for the Shrimp programming language. It implements a complete VM with closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.

**Essential reading**: Before making changes, read README.md, SPEC.md, and GUIDE.md to understand the VM architecture, instruction set, and compiler patterns.

## Development Commands

### Running Files
```bash
bun <file.ts>              # Run TypeScript files directly
bun examples/native.ts     # Run example
```

### Testing
```bash
bun test                   # Run all tests
bun test <file>            # Run specific test file
bun test --watch           # Watch mode
```

### Tools
```bash
./bin/reef <file.reef>     # Execute bytecode file
./bin/validate <file.reef> # Validate bytecode
./bin/debug <file.reef>    # Step-by-step debugger
./bin/repl                 # Interactive REPL
```

### Building
No build step required - Bun runs TypeScript directly.

## Architecture

### Core Components

**VM Execution Model** (src/vm.ts):
- Stack-based execution with program counter (PC)
- Call stack for function frames
- Exception handler stack for try/catch/finally
- Lexical scope chain with parent references (includes native functions)

**Key subsystems**:
- **bytecode.ts**: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The `toBytecode()` function accepts either a string (human-readable) or typed array format (programmatic).
- **value.ts**: Tagged union Value type system with type coercion functions (toNumber, toString, isTrue, isEqual)
- **scope.ts**: Linked scope chain for variable resolution with lexical scoping
- **frame.ts**: Call frame tracking for function calls and break targets
- **exception.ts**: Exception handler records for try/catch/finally blocks
- **validator.ts**: Bytecode validation to catch common errors before execution
- **opcode.ts**: OpCode enum defining all VM instructions

### Critical Design Decisions

**Label-based jumps**: All JUMP instructions (`JUMP`, `JUMP_IF_FALSE`, `JUMP_IF_TRUE`) require label operands (`.label`), not numeric offsets. Labels are resolved to PC-relative offsets during compilation, making bytecode position-independent. PUSH_TRY/PUSH_FINALLY use absolute addresses and can accept either labels or numeric offsets.

**Truthiness semantics**: Only `null` and `false` are falsy. Unlike JavaScript, `0`, `""`, empty arrays, and empty dicts are truthy.

**No AND/OR opcodes**: Short-circuit logical operations are implemented at the compiler level using JUMP patterns with DUP.

**Tail call optimization**: TAIL_CALL reuses the current call frame instead of pushing a new one, enabling unbounded recursion.

**Break semantics**: CALL marks frames as break targets. BREAK unwinds the call stack to the most recent break target, enabling Ruby-style iterator patterns.

**Exception handling**: THROW jumps to finally (if present) or catch. The VM does NOT auto-jump to finally on successful try completion - compilers must explicitly generate JUMPs to finally blocks.

**Parameter binding priority**: Named args bind to fixed params first. Unmatched named args go to `@named` dict parameter. Fixed params bind in order: named arg > positional arg > default > null.

**Native function calling**: Native functions are stored in scope and called via LOAD + CALL, using the same calling convention as Reef functions. Named arguments are supported by extracting parameter names from the function signature at call time.

## Testing Strategy

Tests are organized by feature area:
- **opcodes.test.ts**: Stack ops, arithmetic, comparisons, variables, control flow
- **functions.test.ts**: Function creation, calls, closures, defaults, variadic, named args
- **tail-call.test.ts**: Tail call optimization and unbounded recursion
- **exceptions.test.ts**: Try/catch/finally, exception unwinding, nested handlers
- **native.test.ts**: Native function interop (sync and async)
- **functions-parameter.test.ts**: Convenience parameter for passing functions to run() and VM
- **bytecode.test.ts**: Bytecode string parser, label resolution, constants
- **programmatic.test.ts**: Array format API, typed tuples, labels, functions
- **validator.test.ts**: Bytecode validation rules
- **unicode.test.ts**: Unicode and emoji identifiers
- **regex.test.ts**: RegExp support
- **examples.test.ts**: Integration tests for example programs

When adding features:
1. Add unit tests for the specific opcode/feature
2. Add integration tests showing real-world usage
3. Update SPEC.md with formal specification
4. Update GUIDE.md with compiler patterns
5. Consider adding an example to examples/

## Common Patterns

### Writing Bytecode Tests

ReefVM supports two bytecode formats: string and array.

**String format** (human-readable):
```typescript
import { toBytecode, run } from "#reef"

const bytecode = toBytecode(`
  PUSH 42
  STORE x
  LOAD x
  HALT
`)

const result = await run(bytecode)
// result is { type: 'number', value: 42 }
```

**Array format** (programmatic, type-safe):
```typescript
import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"],
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)
// result is { type: 'number', value: 42 }
```

Array format features:
- Typed tuples for compile-time type checking
- Labels defined as `[".label:"]` (single-element arrays with colon suffix)
- Label references as strings: `["JUMP", ".label"]` (no colon in references)
- Function params as string arrays: `["MAKE_FUNCTION", ["x", "y=10"], ".body"]`
- See `tests/programmatic.test.ts` and `examples/programmatic.ts` for examples

### Native Function Registration and Global Values

**Option 1**: Pass to `run()` or `VM` constructor (convenience)
```typescript
const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`,
  pi: 3.14159,
  config: { debug: true, port: 8080 }
})

// Or with VM constructor
const vm = new VM(bytecode, { add, greet, pi, config })
```

**Option 2**: Set values with `vm.set()` (manual)
```typescript
const vm = new VM(bytecode)

// Set functions (auto-wrapped to native functions)
vm.set('add', (a: number, b: number) => a + b)

// Set any other values (auto-converted to ReefVM Values)
vm.set('pi', 3.14159)
vm.set('config', { debug: true, port: 8080 })

await vm.run()
```

**Option 3**: Set Value-based functions with `vm.setValueFunction()` (advanced)

For functions that work directly with ReefVM Value types:

```typescript
const vm = new VM(bytecode)

// Set Value-based function (no wrapping, works directly with Values)
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return toValue(toNumber(a) + toNumber(b))
})

await vm.run()
```

Auto-wrapping handles:
- Functions: wrapped as native functions with Value ↔ native type conversion
- Sync and async functions
- Arrays, objects, primitives, null, RegExp
- All values converted via `toValue()`

### Calling Functions from TypeScript

Use `vm.call()` to invoke Reef or native functions from TypeScript:

```typescript
const bytecode = toBytecode(`
  MAKE_FUNCTION (x y=10) .add
  STORE add
  HALT

  .add:
    LOAD x
    LOAD y
    ADD
    RETURN
`)

const vm = new VM(bytecode, {
  log: (msg: string) => console.log(msg)  // Native function
})
await vm.run()

// Call Reef function with positional arguments
const result1 = await vm.call('add', 5, 3)  // → 8

// Call Reef function with named arguments (pass final object)
const result2 = await vm.call('add', 5, { y: 20 })  // → 25

// Call Reef function with all named arguments
const result3 = await vm.call('add', { x: 10, y: 15 })  // → 25

// Call native function
await vm.call('log', 'Hello!')
```

**How it works**:
- Looks up function (Reef or native) in VM scope
- For Reef functions: converts to callable JavaScript function using `fnFromValue`
- For native functions: calls directly
- Automatically converts arguments to ReefVM Values
- Converts result back to JavaScript types

### Label Usage (Required for JUMP instructions)
All JUMP instructions must use labels:
```
JUMP .skip
PUSH 42
HALT
.skip:
  PUSH 99
  HALT
```

### Function Definition Patterns

When defining functions, you MUST prevent the PC from falling through into function bodies. Two patterns:

**Pattern 1: JUMP over function bodies (Recommended)**
```
MAKE_FUNCTION (params) .body
STORE function_name
JUMP .end              ; Skip over function body
.body:
  <function code>
  RETURN
.end:
  <continue with program>
```

**Pattern 2: Function bodies after HALT**
```
MAKE_FUNCTION (params) .body
STORE function_name
<use the function>
HALT                   ; Stop before function bodies
.body:
  <function code>
  RETURN
```

Pattern 1 is required for:
- Defining multiple functions before using them
- REPL mode
- Any case where execution continues after defining a function

Pattern 2 only works if you HALT before reaching function bodies.

### REPL Mode (Incremental Execution)

For building REPLs (like the Shrimp REPL), use `vm.continue()` and `vm.appendBytecode()`:

```typescript
const vm = new VM(toBytecode([]), natives)
await vm.run()  // Initialize (empty bytecode)

// User enters: x = 42
const line1 = compileLine("x = 42")  // No HALT!
vm.appendBytecode(line1)
await vm.continue()  // Execute only line 1

// User enters: x + 10
const line2 = compileLine("x + 10")  // No HALT!
vm.appendBytecode(line2)
await vm.continue()  // Execute only line 2, result is 52
```

**Key points**:
- `vm.run()` resets PC to 0 (re-executes everything) - use for initial setup only
- `vm.continue()` resumes from current PC (executes only new bytecode)
- `vm.appendBytecode(bytecode)` properly handles constant index remapping
- Don't use HALT in REPL lines - let VM stop naturally
- Scope and variables persist across all lines
- Side effects only run once

## TypeScript Configuration

- Import alias: `#reef` maps to `./src/index.ts`
- Module system: ES modules (`"type": "module"` in package.json)
- Bun automatically handles TypeScript compilation

## Bun-Specific Notes

- Use `bun` instead of `node`, `npm`, `pnpm`, or `vite`
- No need for dotenv - Bun loads .env automatically
- Prefer Bun APIs over Node.js equivalents when available
- See .cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc for detailed Bun usage

## Adding a New OpCode

When adding a new instruction to ReefVM, you must update multiple files in a specific order. Follow this checklist:

### 1. Define the OpCode (src/opcode.ts)

Add the new opcode to the `OpCode` enum with comprehensive documentation:

```typescript
export enum OpCode {
  // ... existing opcodes

  MY_NEW_OP,  // operand: <type> | stack: [inputs] → [outputs]
              // Description of what it does
              // Any important behavioral notes
}
```

### 2. Implement VM Execution (src/vm.ts)

Add a case to the `execute()` method's switch statement:

```typescript
async execute(instruction: Instruction) {
  switch (instruction.op) {
    // ... existing cases

    case OpCode.MY_NEW_OP:
      // Implementation
      // - Pop values from this.stack as needed
      // - Perform the operation
      // - Push results to this.stack
      // - Throw errors for invalid operations
      // - Use await for async operations
      break
  }
}
```

Common helper methods:
- `this.binaryOp((a, b) => ...)` - For binary arithmetic/comparison
- `toNumber(value)`, `toString(value)`, `isTrue(value)`, `isEqual(a, b)` - Type coercion
- `this.scope.get(name)`, `this.scope.set(name, value)` - Variable access

### 3. Update Validator (src/validator.ts)

Add the opcode to the appropriate set:

```typescript
// If your opcode requires an operand:
const OPCODES_WITH_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

// If your opcode takes no operand:
const OPCODES_WITHOUT_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])
```

If your opcode has complex operand validation, add a specific check in the validation loop around line 154.

### 4. Update Array API (src/bytecode.ts)

Add your instruction to the `InstructionTuple` type:

```typescript
type InstructionTuple =
  // ... existing types
  | ["MY_NEW_OP"]                    // No operand
  | ["MY_NEW_OP", string]            // String operand
  | ["MY_NEW_OP", number]            // Number operand
  | ["MY_NEW_OP", string, number]    // Multiple operands
```

If your opcode has special operand handling, add a case in `toBytecodeFromArray()` around line 241.

### 5. Write Tests (REQUIRED)

Create tests in the appropriate test file:

```typescript
// tests/basic.test.ts, tests/functions.test.ts, etc.

test("MY_NEW_OP description", async () => {
  const bytecode = toBytecode([
    // Setup
    ["PUSH", 42],
    ["MY_NEW_OP"],
    ["HALT"]
  ])

  const result = await run(bytecode)
  expect(result).toEqual({ type: "number", value: 42 })
})

// Test edge cases
test("MY_NEW_OP with invalid input", async () => {
  // Test error conditions
  await expect(run(bytecode)).rejects.toThrow()
})
```

**ALWAYS write tests.** Test both success cases and error conditions. Add integration tests showing real-world usage.

### 6. Document Specification (SPEC.md)

Add a formal specification entry:

```markdown
#### MY_NEW_OP

**Operand**: `<type>`
**Stack**: `[input] → [output]`

Description of what the instruction does.

**Behavior**:
- Specific behavior point 1
- Specific behavior point 2

**Errors**:
- Error condition 1
- Error condition 2
```

### 7. Update Compiler Guide (GUIDE.md)

If your opcode introduces new patterns, add examples to GUIDE.md:

```markdown
### New Pattern Name

\```
PUSH value
MY_NEW_OP
STORE result
\```

Description of the pattern and when to use it.
```

### 8. Add Examples (Optional)

If your opcode enables new functionality, add an example to `examples/`:

```typescript
// examples/my_feature.reef or examples/my_feature.ts
const example = toBytecode([
  // Demonstrate the new opcode
])
```

### Checklist Summary

When adding an opcode, update in this order:

- [ ] `src/opcode.ts` - Add enum value with docs
- [ ] `src/vm.ts` - Implement execution logic
- [ ] `src/validator.ts` - Add to operand requirement set
- [ ] `src/bytecode.ts` - Add to InstructionTuple type
- [ ] `tests/*.test.ts` - Write comprehensive tests (**REQUIRED**)
- [ ] `SPEC.md` - Document formal specification
- [ ] `GUIDE.md` - Add compiler patterns (if applicable)
- [ ] `examples/` - Add example code (if applicable)

Run `bun test` to verify all tests pass before committing.

## Common Gotchas

**Label requirements**: JUMP/JUMP_IF_FALSE/JUMP_IF_TRUE require label operands (`.label`), not numeric offsets. The bytecode compiler resolves labels to PC-relative offsets internally. PUSH_TRY/PUSH_FINALLY can use either labels or absolute instruction indices (`#N`).

**Stack operations**: Most binary operations pop in reverse order (second operand is popped first, then first operand).

**MAKE_ARRAY operand**: Specifies count, not a stack index. `MAKE_ARRAY #3` pops 3 items.

**Finally blocks**: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.

**Variable scoping**: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.

**Identifiers**: Variable and parameter names support Unicode and emoji! Valid: `💎`, `🌟`, `変数`, `counter`. Invalid: cannot start with digits or special prefixes (`.`, `#`, `@`, `...`), cannot contain whitespace or syntax characters.