Chris Wanstrath 78923b3eff more natural native functions

2025-10-08 09:57:49 -07:00

12 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ReefVM is a stack-based bytecode virtual machine for the Shrimp programming language. It implements a complete VM with closures, tail call optimization, exception handling, variadic functions, named parameters, and Ruby-style iterators with break/continue.

Essential reading: Before making changes, read README.md, SPEC.md, and GUIDE.md to understand the VM architecture, instruction set, and compiler patterns.

Development Commands

Running Files

bun <file.ts>              # Run TypeScript files directly
bun examples/native.ts     # Run example

Testing

bun test                   # Run all tests
bun test <file>            # Run specific test file
bun test --watch           # Watch mode

Tools

./bin/reef <file.reef>     # Execute bytecode file
./bin/validate <file.reef> # Validate bytecode
./bin/debug <file.reef>    # Step-by-step debugger
./bin/repl                 # Interactive REPL

Building

No build step required - Bun runs TypeScript directly.

Architecture

Core Components

VM Execution Model (src/vm.ts):

Stack-based execution with program counter (PC)
Call stack for function frames
Exception handler stack for try/catch/finally
Lexical scope chain with parent references
Native function registry for TypeScript interop

Key subsystems:

bytecode.ts: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The toBytecode() function accepts either a string (human-readable) or typed array format (programmatic).
value.ts: Tagged union Value type system with type coercion functions (toNumber, toString, isTrue, isEqual)
scope.ts: Linked scope chain for variable resolution with lexical scoping
frame.ts: Call frame tracking for function calls and break targets
exception.ts: Exception handler records for try/catch/finally blocks
validator.ts: Bytecode validation to catch common errors before execution
opcode.ts: OpCode enum defining all VM instructions

Critical Design Decisions

Relative jumps: All JUMP instructions use PC-relative offsets (not absolute addresses), making bytecode position-independent. PUSH_TRY/PUSH_FINALLY use absolute addresses.

Truthiness semantics: Only null and false are falsy. Unlike JavaScript, 0, "", empty arrays, and empty dicts are truthy.

No AND/OR opcodes: Short-circuit logical operations are implemented at the compiler level using JUMP patterns with DUP.

Tail call optimization: TAIL_CALL reuses the current call frame instead of pushing a new one, enabling unbounded recursion.

Break semantics: CALL marks frames as break targets. BREAK unwinds the call stack to the most recent break target, enabling Ruby-style iterator patterns.

Exception handling: THROW jumps to finally (if present) or catch. The VM does NOT auto-jump to finally on successful try completion - compilers must explicitly generate JUMPs to finally blocks.

Parameter binding priority: Named args bind to fixed params first. Unmatched named args go to @named dict parameter. Fixed params bind in order: named arg > positional arg > default > null.

Native function calling: CALL_NATIVE consumes the entire stack as arguments (different from CALL which pops specific argument counts).

Testing Strategy

Tests are organized by feature area:

basic.test.ts: Stack ops, arithmetic, comparisons, variables, control flow
functions.test.ts: Function creation, calls, closures, defaults, variadic, named args
tail-call.test.ts: Tail call optimization and unbounded recursion
exceptions.test.ts: Try/catch/finally, exception unwinding, nested handlers
native.test.ts: Native function interop (sync and async)
bytecode.test.ts: Bytecode string parser, label resolution, constants
programmatic.test.ts: Array format API, typed tuples, labels, functions
validator.test.ts: Bytecode validation rules
examples.test.ts: Integration tests for example programs

When adding features:

Add unit tests for the specific opcode/feature
Add integration tests showing real-world usage
Update SPEC.md with formal specification
Update GUIDE.md with compiler patterns
Consider adding an example to examples/

Common Patterns

Writing Bytecode Tests

ReefVM supports two bytecode formats: string and array.

String format (human-readable):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode(`
  PUSH 42
  STORE x
  LOAD x
  HALT
`)

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format (programmatic, type-safe):

import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"],
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)
// result is { type: 'number', value: 42 }

Array format features:

Typed tuples for compile-time type checking
Labels defined as [".label:"] (single-element arrays with colon suffix)
Label references as strings: ["JUMP", ".label"] (no colon in references)
Function params as string arrays: ["MAKE_FUNCTION", ["x", "y=10"], ".body"]
See tests/programmatic.test.ts and examples/programmatic.ts for examples

Native Function Registration

ReefVM supports two ways to register native functions:

1. Native TypeScript functions (recommended) - Auto-converts between native TS and ReefVM types:

const vm = new VM(bytecode)

// Works with native TypeScript types!
vm.registerFunction('add', (a: number, b: number) => {
  return a + b
})

// Supports defaults (like NOSE commands)
vm.registerFunction('ls', (path: string, link = false) => {
  return link ? `listing ${path} with links` : `listing ${path}`
})

// Async functions work too
vm.registerFunction('fetch', async (url: string) => {
  const response = await fetch(url)
  return await response.text()
})

await vm.run()

2. Value-based functions (manual) - For functions that need direct Value access:

const vm = new VM(bytecode)

vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
  // Direct access to Value types
  return toValue(toNumber(a) + toNumber(b))
})

await vm.run()

The auto-wrapping handles:

Converting Value → native types on input (using fromValue)
Converting native types → Value on output (using toValue)
Both sync and async functions
Arrays, objects, primitives, and null

Label Usage (Preferred)

Use labels instead of numeric offsets for readability:

JUMP .skip
PUSH 42
HALT
.skip:
  PUSH 99
  HALT

TypeScript Configuration

Import alias: #reef maps to ./src/index.ts
Module system: ES modules ("type": "module" in package.json)
Bun automatically handles TypeScript compilation

Bun-Specific Notes

Use bun instead of node, npm, pnpm, or vite
No need for dotenv - Bun loads .env automatically
Prefer Bun APIs over Node.js equivalents when available
See .cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc for detailed Bun usage

Adding a New OpCode

When adding a new instruction to ReefVM, you must update multiple files in a specific order. Follow this checklist:

1. Define the OpCode (src/opcode.ts)

Add the new opcode to the OpCode enum with comprehensive documentation:

export enum OpCode {
  // ... existing opcodes

  MY_NEW_OP,  // operand: <type> | stack: [inputs] → [outputs]
              // Description of what it does
              // Any important behavioral notes
}

2. Implement VM Execution (src/vm.ts)

Add a case to the execute() method's switch statement:

async execute(instruction: Instruction) {
  switch (instruction.op) {
    // ... existing cases

    case OpCode.MY_NEW_OP:
      // Implementation
      // - Pop values from this.stack as needed
      // - Perform the operation
      // - Push results to this.stack
      // - Throw errors for invalid operations
      // - Use await for async operations
      break
  }
}

Common helper methods:

this.binaryOp((a, b) => ...) - For binary arithmetic/comparison
toNumber(value), toString(value), isTrue(value), isEqual(a, b) - Type coercion
this.scope.get(name), this.scope.set(name, value) - Variable access

3. Update Validator (src/validator.ts)

Add the opcode to the appropriate set:

// If your opcode requires an operand:
const OPCODES_WITH_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

// If your opcode takes no operand:
const OPCODES_WITHOUT_OPERANDS = new Set([
  // ... existing
  OpCode.MY_NEW_OP,
])

If your opcode has complex operand validation, add a specific check in the validation loop around line 154.

4. Update Array API (src/bytecode.ts)

Add your instruction to the InstructionTuple type:

type InstructionTuple =
  // ... existing types
  | ["MY_NEW_OP"]                    // No operand
  | ["MY_NEW_OP", string]            // String operand
  | ["MY_NEW_OP", number]            // Number operand
  | ["MY_NEW_OP", string, number]    // Multiple operands

If your opcode has special operand handling, add a case in toBytecodeFromArray() around line 241.

5. Write Tests (REQUIRED)

Create tests in the appropriate test file:

// tests/basic.test.ts, tests/functions.test.ts, etc.

test("MY_NEW_OP description", async () => {
  const bytecode = toBytecode([
    // Setup
    ["PUSH", 42],
    ["MY_NEW_OP"],
    ["HALT"]
  ])

  const result = await run(bytecode)
  expect(result).toEqual({ type: "number", value: 42 })
})

// Test edge cases
test("MY_NEW_OP with invalid input", async () => {
  // Test error conditions
  await expect(run(bytecode)).rejects.toThrow()
})

ALWAYS write tests. Test both success cases and error conditions. Add integration tests showing real-world usage.

6. Document Specification (SPEC.md)

Add a formal specification entry:

#### MY_NEW_OP

**Operand**: `<type>`
**Stack**: `[input] → [output]`

Description of what the instruction does.

**Behavior**:
- Specific behavior point 1
- Specific behavior point 2

**Errors**:
- Error condition 1
- Error condition 2

7. Update Compiler Guide (GUIDE.md)

If your opcode introduces new patterns, add examples to GUIDE.md:

### New Pattern Name

\```
PUSH value
MY_NEW_OP
STORE result
\```

Description of the pattern and when to use it.

8. Add Examples (Optional)

If your opcode enables new functionality, add an example to examples/:

// examples/my_feature.reef or examples/my_feature.ts
const example = toBytecode([
  // Demonstrate the new opcode
])

Checklist Summary

When adding an opcode, update in this order:

src/opcode.ts - Add enum value with docs
src/vm.ts - Implement execution logic
src/validator.ts - Add to operand requirement set
src/bytecode.ts - Add to InstructionTuple type
tests/*.test.ts - Write comprehensive tests (REQUIRED)
SPEC.md - Document formal specification
GUIDE.md - Add compiler patterns (if applicable)
examples/ - Add example code (if applicable)

Run bun test to verify all tests pass before committing.

Common Gotchas

Jump offsets: JUMP/JUMP_IF_FALSE/JUMP_IF_TRUE use relative offsets from the next instruction (PC + 1). PUSH_TRY/PUSH_FINALLY use absolute instruction indices.

Stack operations: Most binary operations pop in reverse order (second operand is popped first, then first operand).

MAKE_ARRAY operand: Specifies count, not a stack index. MAKE_ARRAY #3 pops 3 items.

CALL_NATIVE stack behavior: Unlike CALL, it consumes all stack values as arguments and clears the stack.

Finally blocks: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.

Variable scoping: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.

Identifiers: Variable and parameter names support Unicode and emoji! Valid: 💎, 🌟, 変数, counter. Invalid: cannot start with digits or special prefixes (., #, @, ...), cannot contain whitespace or syntax characters.

12 KiB Raw Blame History