ReefVM/CLAUDE.md

13 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ReefVM is a stack-based bytecode virtual machine for the Shrimp programming language. It implements a complete VM with closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.

Essential reading: Before making changes, read README.md, SPEC.md, and GUIDE.md to understand the VM architecture, instruction set, and compiler patterns.

Development Commands

Running Files

bun <file.ts>              # Run TypeScript files directly
bun examples/native.ts     # Run example

Testing

bun test                   # Run all tests
bun test <file>            # Run specific test file
bun test --watch           # Watch mode

Tools

./bin/reef <file.reef>     # Execute bytecode file
./bin/validate <file.reef> # Validate bytecode
./bin/debug <file.reef>    # Step-by-step debugger
./bin/repl                 # Interactive REPL

Building

No build step required - Bun runs TypeScript directly.

Architecture

Core Components

VM Execution Model (src/vm.ts):

  • Stack-based execution with program counter (PC)
  • Call stack for function frames
  • Exception handler stack for try/catch/finally
  • Lexical scope chain with parent references (includes native functions)

Key subsystems:

  • bytecode.ts: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The toBytecode() function accepts either a string (human-readable) or typed array format (programmatic).
  • value.ts: Tagged union Value type system with type coercion functions (toNumber, toString, isTrue, isEqual)
  • scope.ts: Linked scope chain for variable resolution with lexical scoping
  • frame.ts: Call frame tracking for function calls and break targets
  • exception.ts: Exception handler records for try/catch/finally blocks
  • validator.ts: Bytecode validation to catch common errors before execution
  • opcode.ts: OpCode enum defining all VM instructions

Critical Design Decisions

Relative jumps: All JUMP instructions use PC-relative offsets (not absolute addresses), making bytecode position-independent. PUSH_TRY/PUSH_FINALLY use absolute addresses.

Truthiness semantics: Only null and false are falsy. Unlike JavaScript, 0, "", empty arrays, and empty dicts are truthy.

No AND/OR opcodes: Short-circuit logical operations are implemented at the compiler level using JUMP patterns with DUP.

Tail call optimization: TAIL_CALL reuses the current call frame instead of pushing a new one, enabling unbounded recursion.

Break semantics: CALL marks frames as break targets. BREAK unwinds the call stack to the most recent break target, enabling Ruby-style iterator patterns.

Exception handling: THROW jumps to finally (if present) or catch. The VM does NOT auto-jump to finally on successful try completion - compilers must explicitly generate JUMPs to finally blocks.

Parameter binding priority: Named args bind to fixed params first. Unmatched named args go to @named dict parameter. Fixed params bind in order: named arg > positional arg > default > null.

Native function calling: Native functions are stored in scope and called via LOAD + CALL, using the same calling convention as Reef functions. Named arguments are supported by extracting parameter names from the function signature at call time.

Testing Strategy

Tests are organized by feature area:

  • opcodes.test.ts: Stack ops, arithmetic, comparisons, variables, control flow
  • functions.test.ts: Function creation, calls, closures, defaults, variadic, named args
  • tail-call.test.ts: Tail call optimization and unbounded recursion
  • exceptions.test.ts: Try/catch/finally, exception unwinding, nested handlers
  • native.test.ts: Native function interop (sync and async)
  • functions-parameter.test.ts: Convenience parameter for passing functions to run() and VM
  • bytecode.test.ts: Bytecode string parser, label resolution, constants
  • programmatic.test.ts: Array format API, typed tuples, labels, functions
  • validator.test.ts: Bytecode validation rules
  • unicode.test.ts: Unicode and emoji identifiers
  • regex.test.ts: RegExp support
  • examples.test.ts: Integration tests for example programs

When adding features:

  1. Add unit tests for the specific opcode/feature
  2. Add integration tests showing real-world usage
  3. Update SPEC.md with formal specification
  4. Update GUIDE.md with compiler patterns
  5. Consider adding an example to examples/

Common Patterns

Writing Bytecode Tests

ReefVM supports two bytecode formats: string and array.

String format (human-readable):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode(`
  PUSH 42
  STORE x
  LOAD x
  HALT
`)

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format (programmatic, type-safe):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"],
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format features:

  • Typed tuples for compile-time type checking
  • Labels defined as [".label:"] (single-element arrays with colon suffix)
  • Label references as strings: ["JUMP", ".label"] (no colon in references)
  • Function params as string arrays: ["MAKE_FUNCTION", ["x", "y=10"], ".body"]
  • See tests/programmatic.test.ts and examples/programmatic.ts for examples

Native Function Registration

Option 1: Pass to run() or VM constructor (convenience)

const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or with VM constructor
const vm = new VM(bytecode, { add, greet })

Option 2: Register with vm.registerFunction() (manual)

const vm = new VM(bytecode)
vm.registerFunction('add', (a: number, b: number) => a + b)
await vm.run()

Option 3: Register Value-based functions (for direct Value access)

vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
  return toValue(toNumber(a) + toNumber(b))
})

Auto-wrapping handles:

  • Value ↔ native type conversion (fromValue/toValue)
  • Sync and async functions
  • Arrays, objects, primitives, null, RegExp

Calling Reef Functions from TypeScript

Use vm.call() to invoke Reef functions from TypeScript:

const bytecode = toBytecode(`
  MAKE_FUNCTION (x y=10) .add
  STORE add
  HALT

  .add:
    LOAD x
    LOAD y
    ADD
    RETURN
`)

const vm = new VM(bytecode)
await vm.run()

// Positional arguments
const result1 = await vm.call('add', 5, 3)  // → 8

// Named arguments (pass final object)
const result2 = await vm.call('add', 5, { y: 20 })  // → 25

// All named arguments
const result3 = await vm.call('add', { x: 10, y: 15 })  // → 25

How it works:

  • Looks up function in VM scope
  • Converts it to a callable JavaScript function using fnFromValue
  • Automatically converts arguments to ReefVM Values
  • Executes the function in a fresh VM context
  • Converts result back to JavaScript types

Label Usage (Preferred)

Use labels instead of numeric offsets for readability:

JUMP .skip
PUSH 42
HALT
.skip:
  PUSH 99
  HALT

TypeScript Configuration

  • Import alias: #reef maps to ./src/index.ts
  • Module system: ES modules ("type": "module" in package.json)
  • Bun automatically handles TypeScript compilation

Bun-Specific Notes

  • Use bun instead of node, npm, pnpm, or vite
  • No need for dotenv - Bun loads .env automatically
  • Prefer Bun APIs over Node.js equivalents when available
  • See .cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc for detailed Bun usage

Adding a New OpCode

When adding a new instruction to ReefVM, you must update multiple files in a specific order. Follow this checklist:

1. Define the OpCode (src/opcode.ts)

Add the new opcode to the OpCode enum with comprehensive documentation:

export enum OpCode {
  // ... existing opcodes

  MY_NEW_OP,  // operand: <type> | stack: [inputs] → [outputs]
              // Description of what it does
              // Any important behavioral notes
}

2. Implement VM Execution (src/vm.ts)

Add a case to the execute() method's switch statement:

async execute(instruction: Instruction) {
  switch (instruction.op) {
    // ... existing cases

    case OpCode.MY_NEW_OP:
      // Implementation
      // - Pop values from this.stack as needed
      // - Perform the operation
      // - Push results to this.stack
      // - Throw errors for invalid operations
      // - Use await for async operations
      break
  }
}

Common helper methods:

  • this.binaryOp((a, b) => ...) - For binary arithmetic/comparison
  • toNumber(value), toString(value), isTrue(value), isEqual(a, b) - Type coercion
  • this.scope.get(name), this.scope.set(name, value) - Variable access

3. Update Validator (src/validator.ts)

Add the opcode to the appropriate set:

// If your opcode requires an operand:
const OPCODES_WITH_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

// If your opcode takes no operand:
const OPCODES_WITHOUT_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

If your opcode has complex operand validation, add a specific check in the validation loop around line 154.

4. Update Array API (src/bytecode.ts)

Add your instruction to the InstructionTuple type:

type InstructionTuple =
  // ... existing types
  | ["MY_NEW_OP"]                    // No operand
  | ["MY_NEW_OP", string]            // String operand
  | ["MY_NEW_OP", number]            // Number operand
  | ["MY_NEW_OP", string, number]    // Multiple operands

If your opcode has special operand handling, add a case in toBytecodeFromArray() around line 241.

5. Write Tests (REQUIRED)

Create tests in the appropriate test file:

// tests/basic.test.ts, tests/functions.test.ts, etc.

test("MY_NEW_OP description", async () => {
  const bytecode = toBytecode([
    // Setup
    ["PUSH", 42],
    ["MY_NEW_OP"],
    ["HALT"]
  ])

  const result = await run(bytecode)
  expect(result).toEqual({ type: "number", value: 42 })
})

// Test edge cases
test("MY_NEW_OP with invalid input", async () => {
  // Test error conditions
  await expect(run(bytecode)).rejects.toThrow()
})

ALWAYS write tests. Test both success cases and error conditions. Add integration tests showing real-world usage.

6. Document Specification (SPEC.md)

Add a formal specification entry:

#### MY_NEW_OP

**Operand**: `<type>`
**Stack**: `[input] → [output]`

Description of what the instruction does.

**Behavior**:
- Specific behavior point 1
- Specific behavior point 2

**Errors**:
- Error condition 1
- Error condition 2

7. Update Compiler Guide (GUIDE.md)

If your opcode introduces new patterns, add examples to GUIDE.md:

### New Pattern Name

\```
PUSH value
MY_NEW_OP
STORE result
\```

Description of the pattern and when to use it.

8. Add Examples (Optional)

If your opcode enables new functionality, add an example to examples/:

// examples/my_feature.reef or examples/my_feature.ts
const example = toBytecode([
  // Demonstrate the new opcode
])

Checklist Summary

When adding an opcode, update in this order:

  • src/opcode.ts - Add enum value with docs
  • src/vm.ts - Implement execution logic
  • src/validator.ts - Add to operand requirement set
  • src/bytecode.ts - Add to InstructionTuple type
  • tests/*.test.ts - Write comprehensive tests (REQUIRED)
  • SPEC.md - Document formal specification
  • GUIDE.md - Add compiler patterns (if applicable)
  • examples/ - Add example code (if applicable)

Run bun test to verify all tests pass before committing.

Common Gotchas

Jump offsets: JUMP/JUMP_IF_FALSE/JUMP_IF_TRUE use relative offsets from the next instruction (PC + 1). PUSH_TRY/PUSH_FINALLY use absolute instruction indices.

Stack operations: Most binary operations pop in reverse order (second operand is popped first, then first operand).

MAKE_ARRAY operand: Specifies count, not a stack index. MAKE_ARRAY #3 pops 3 items.

Finally blocks: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.

Variable scoping: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.

Identifiers: Variable and parameter names support Unicode and emoji! Valid: 💎, 🌟, 変数, counter. Invalid: cannot start with digits or special prefixes (., #, @, ...), cannot contain whitespace or syntax characters.