Chris Wanstrath b2a6021fb8 require labels for JUMP opcodes to avoid compiler bugs

2025-11-09 22:18:10 -08:00

16 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ReefVM is a stack-based bytecode virtual machine for the Shrimp programming language. It implements a complete VM with closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.

Essential reading: Before making changes, read README.md, SPEC.md, and GUIDE.md to understand the VM architecture, instruction set, and compiler patterns.

Development Commands

Running Files

bun <file.ts>              # Run TypeScript files directly
bun examples/native.ts     # Run example

Testing

bun test                   # Run all tests
bun test <file>            # Run specific test file
bun test --watch           # Watch mode

Tools

./bin/reef <file.reef>     # Execute bytecode file
./bin/validate <file.reef> # Validate bytecode
./bin/debug <file.reef>    # Step-by-step debugger
./bin/repl                 # Interactive REPL

Building

No build step required - Bun runs TypeScript directly.

Architecture

Core Components

VM Execution Model (src/vm.ts):

Stack-based execution with program counter (PC)
Call stack for function frames
Exception handler stack for try/catch/finally
Lexical scope chain with parent references (includes native functions)

Key subsystems:

bytecode.ts: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The toBytecode() function accepts either a string (human-readable) or typed array format (programmatic).
value.ts: Tagged union Value type system with type coercion functions (toNumber, toString, isTrue, isEqual)
scope.ts: Linked scope chain for variable resolution with lexical scoping
frame.ts: Call frame tracking for function calls and break targets
exception.ts: Exception handler records for try/catch/finally blocks
validator.ts: Bytecode validation to catch common errors before execution
opcode.ts: OpCode enum defining all VM instructions

Critical Design Decisions

Label-based jumps: All JUMP instructions (JUMP, JUMP_IF_FALSE, JUMP_IF_TRUE) require label operands (.label), not numeric offsets. Labels are resolved to PC-relative offsets during compilation, making bytecode position-independent. PUSH_TRY/PUSH_FINALLY use absolute addresses and can accept either labels or numeric offsets.

Truthiness semantics: Only null and false are falsy. Unlike JavaScript, 0, "", empty arrays, and empty dicts are truthy.

No AND/OR opcodes: Short-circuit logical operations are implemented at the compiler level using JUMP patterns with DUP.

Tail call optimization: TAIL_CALL reuses the current call frame instead of pushing a new one, enabling unbounded recursion.

Break semantics: CALL marks frames as break targets. BREAK unwinds the call stack to the most recent break target, enabling Ruby-style iterator patterns.

Exception handling: THROW jumps to finally (if present) or catch. The VM does NOT auto-jump to finally on successful try completion - compilers must explicitly generate JUMPs to finally blocks.

Parameter binding priority: Named args bind to fixed params first. Unmatched named args go to @named dict parameter. Fixed params bind in order: named arg > positional arg > default > null.

Native function calling: Native functions are stored in scope and called via LOAD + CALL, using the same calling convention as Reef functions. Named arguments are supported by extracting parameter names from the function signature at call time.

Testing Strategy

Tests are organized by feature area:

opcodes.test.ts: Stack ops, arithmetic, comparisons, variables, control flow
functions.test.ts: Function creation, calls, closures, defaults, variadic, named args
tail-call.test.ts: Tail call optimization and unbounded recursion
exceptions.test.ts: Try/catch/finally, exception unwinding, nested handlers
native.test.ts: Native function interop (sync and async)
functions-parameter.test.ts: Convenience parameter for passing functions to run() and VM
bytecode.test.ts: Bytecode string parser, label resolution, constants
programmatic.test.ts: Array format API, typed tuples, labels, functions
validator.test.ts: Bytecode validation rules
unicode.test.ts: Unicode and emoji identifiers
regex.test.ts: RegExp support
examples.test.ts: Integration tests for example programs

When adding features:

Add unit tests for the specific opcode/feature
Add integration tests showing real-world usage
Update SPEC.md with formal specification
Update GUIDE.md with compiler patterns
Consider adding an example to examples/

Common Patterns

Writing Bytecode Tests

ReefVM supports two bytecode formats: string and array.

String format (human-readable):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode(`
  PUSH 42
  STORE x
  LOAD x
  HALT
`)

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format (programmatic, type-safe):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"],
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format features:

Typed tuples for compile-time type checking
Labels defined as [".label:"] (single-element arrays with colon suffix)
Label references as strings: ["JUMP", ".label"] (no colon in references)
Function params as string arrays: ["MAKE_FUNCTION", ["x", "y=10"], ".body"]
See tests/programmatic.test.ts and examples/programmatic.ts for examples

Native Function Registration and Global Values

Option 1: Pass to run() or VM constructor (convenience)

const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`,
  pi: 3.14159,
  config: { debug: true, port: 8080 }
})

// Or with VM constructor
const vm = new VM(bytecode, { add, greet, pi, config })

Option 2: Set values with vm.set() (manual)

const vm = new VM(bytecode)

// Set functions (auto-wrapped to native functions)
vm.set('add', (a: number, b: number) => a + b)

// Set any other values (auto-converted to ReefVM Values)
vm.set('pi', 3.14159)
vm.set('config', { debug: true, port: 8080 })

await vm.run()

Option 3: Set Value-based functions with vm.setValueFunction() (advanced)

For functions that work directly with ReefVM Value types:

const vm = new VM(bytecode)

// Set Value-based function (no wrapping, works directly with Values)
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return toValue(toNumber(a) + toNumber(b))
})

await vm.run()

Auto-wrapping handles:

Functions: wrapped as native functions with Value ↔ native type conversion
Sync and async functions
Arrays, objects, primitives, null, RegExp
All values converted via toValue()

Calling Functions from TypeScript

Use vm.call() to invoke Reef or native functions from TypeScript:

const bytecode = toBytecode(`
  MAKE_FUNCTION (x y=10) .add
  STORE add
  HALT

  .add:
    LOAD x
    LOAD y
    ADD
    RETURN
`)

const vm = new VM(bytecode, {
  log: (msg: string) => console.log(msg)  // Native function
})
await vm.run()

// Call Reef function with positional arguments
const result1 = await vm.call('add', 5, 3)  // → 8

// Call Reef function with named arguments (pass final object)
const result2 = await vm.call('add', 5, { y: 20 })  // → 25

// Call Reef function with all named arguments
const result3 = await vm.call('add', { x: 10, y: 15 })  // → 25

// Call native function
await vm.call('log', 'Hello!')

How it works:

Looks up function (Reef or native) in VM scope
For Reef functions: converts to callable JavaScript function using fnFromValue
For native functions: calls directly
Automatically converts arguments to ReefVM Values
Converts result back to JavaScript types

Label Usage (Required for JUMP instructions)

All JUMP instructions must use labels:

JUMP .skip
PUSH 42
HALT
.skip:
  PUSH 99
  HALT

Function Definition Patterns

When defining functions, you MUST prevent the PC from falling through into function bodies. Two patterns:

Pattern 1: JUMP over function bodies (Recommended)

MAKE_FUNCTION (params) .body
STORE function_name
JUMP .end              ; Skip over function body
.body:
  <function code>
  RETURN
.end:
  <continue with program>

Pattern 2: Function bodies after HALT

MAKE_FUNCTION (params) .body
STORE function_name
<use the function>
HALT                   ; Stop before function bodies
.body:
  <function code>
  RETURN

Pattern 1 is required for:

Defining multiple functions before using them
REPL mode
Any case where execution continues after defining a function

Pattern 2 only works if you HALT before reaching function bodies.

REPL Mode (Incremental Execution)

For building REPLs (like the Shrimp REPL), use vm.continue() and vm.appendBytecode():

const vm = new VM(toBytecode([]), natives)
await vm.run()  // Initialize (empty bytecode)

// User enters: x = 42
const line1 = compileLine("x = 42")  // No HALT!
vm.appendBytecode(line1)
await vm.continue()  // Execute only line 1

// User enters: x + 10
const line2 = compileLine("x + 10")  // No HALT!
vm.appendBytecode(line2)
await vm.continue()  // Execute only line 2, result is 52

Key points:

vm.run() resets PC to 0 (re-executes everything) - use for initial setup only
vm.continue() resumes from current PC (executes only new bytecode)
vm.appendBytecode(bytecode) properly handles constant index remapping
Don't use HALT in REPL lines - let VM stop naturally
Scope and variables persist across all lines
Side effects only run once

TypeScript Configuration

Import alias: #reef maps to ./src/index.ts
Module system: ES modules ("type": "module" in package.json)
Bun automatically handles TypeScript compilation

Bun-Specific Notes

Use bun instead of node, npm, pnpm, or vite
No need for dotenv - Bun loads .env automatically
Prefer Bun APIs over Node.js equivalents when available
See .cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc for detailed Bun usage

Adding a New OpCode

When adding a new instruction to ReefVM, you must update multiple files in a specific order. Follow this checklist:

1. Define the OpCode (src/opcode.ts)

Add the new opcode to the OpCode enum with comprehensive documentation:

export enum OpCode {
  // ... existing opcodes

  MY_NEW_OP,  // operand: <type> | stack: [inputs] → [outputs]
              // Description of what it does
              // Any important behavioral notes
}

2. Implement VM Execution (src/vm.ts)

Add a case to the execute() method's switch statement:

async execute(instruction: Instruction) {
  switch (instruction.op) {
    // ... existing cases

    case OpCode.MY_NEW_OP:
      // Implementation
      // - Pop values from this.stack as needed
      // - Perform the operation
      // - Push results to this.stack
      // - Throw errors for invalid operations
      // - Use await for async operations
      break
  }
}

Common helper methods:

this.binaryOp((a, b) => ...) - For binary arithmetic/comparison
toNumber(value), toString(value), isTrue(value), isEqual(a, b) - Type coercion
this.scope.get(name), this.scope.set(name, value) - Variable access

3. Update Validator (src/validator.ts)

Add the opcode to the appropriate set:

// If your opcode requires an operand:
const OPCODES_WITH_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

// If your opcode takes no operand:
const OPCODES_WITHOUT_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

If your opcode has complex operand validation, add a specific check in the validation loop around line 154.

4. Update Array API (src/bytecode.ts)

Add your instruction to the InstructionTuple type:

type InstructionTuple =
  // ... existing types
  | ["MY_NEW_OP"]                    // No operand
  | ["MY_NEW_OP", string]            // String operand
  | ["MY_NEW_OP", number]            // Number operand
  | ["MY_NEW_OP", string, number]    // Multiple operands

If your opcode has special operand handling, add a case in toBytecodeFromArray() around line 241.

5. Write Tests (REQUIRED)

Create tests in the appropriate test file:

// tests/basic.test.ts, tests/functions.test.ts, etc.

test("MY_NEW_OP description", async () => {
  const bytecode = toBytecode([
    // Setup
    ["PUSH", 42],
    ["MY_NEW_OP"],
    ["HALT"]
  ])

  const result = await run(bytecode)
  expect(result).toEqual({ type: "number", value: 42 })
})

// Test edge cases
test("MY_NEW_OP with invalid input", async () => {
  // Test error conditions
  await expect(run(bytecode)).rejects.toThrow()
})

ALWAYS write tests. Test both success cases and error conditions. Add integration tests showing real-world usage.

6. Document Specification (SPEC.md)

Add a formal specification entry:

#### MY_NEW_OP

**Operand**: `<type>`
**Stack**: `[input] → [output]`

Description of what the instruction does.

**Behavior**:
- Specific behavior point 1
- Specific behavior point 2

**Errors**:
- Error condition 1
- Error condition 2

7. Update Compiler Guide (GUIDE.md)

If your opcode introduces new patterns, add examples to GUIDE.md:

### New Pattern Name

\```
PUSH value
MY_NEW_OP
STORE result
\```

Description of the pattern and when to use it.

8. Add Examples (Optional)

If your opcode enables new functionality, add an example to examples/:

// examples/my_feature.reef or examples/my_feature.ts
const example = toBytecode([
  // Demonstrate the new opcode
])

Checklist Summary

When adding an opcode, update in this order:

src/opcode.ts - Add enum value with docs
src/vm.ts - Implement execution logic
src/validator.ts - Add to operand requirement set
src/bytecode.ts - Add to InstructionTuple type
tests/*.test.ts - Write comprehensive tests (REQUIRED)
SPEC.md - Document formal specification
GUIDE.md - Add compiler patterns (if applicable)
examples/ - Add example code (if applicable)

Run bun test to verify all tests pass before committing.

Common Gotchas

Label requirements: JUMP/JUMP_IF_FALSE/JUMP_IF_TRUE require label operands (.label), not numeric offsets. The bytecode compiler resolves labels to PC-relative offsets internally. PUSH_TRY/PUSH_FINALLY can use either labels or absolute instruction indices (#N).

Stack operations: Most binary operations pop in reverse order (second operand is popped first, then first operand).

MAKE_ARRAY operand: Specifies count, not a stack index. MAKE_ARRAY #3 pops 3 items.

Finally blocks: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.

Variable scoping: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.

Identifiers: Variable and parameter names support Unicode and emoji! Valid: 💎, 🌟, 変数, counter. Invalid: cannot start with digits or special prefixes (., #, @, ...), cannot contain whitespace or syntax characters.

16 KiB Raw Permalink Blame History