ReefVM/GUIDE.md

# Reef Compiler Guide

Quick reference for compiling to Reef bytecode.

## Bytecode Formats

ReefVM supports two bytecode formats:

1. **String format**: Human-readable text with opcodes and operands
2. **Array format**: TypeScript arrays with typed tuples for programmatic generation

Both formats are compiled using the same `toBytecode()` function.

## Bytecode Syntax

### Instructions
```
OPCODE operand     ; comment
```

### Operand Types

**Immediate numbers** (`#N`): Counts or relative offsets
- `MAKE_ARRAY #3` - count of 3 items
- `JUMP #5` - relative offset of 5 instructions (prefer labels)
- `PUSH_TRY #10` - absolute instruction index (prefer labels)

**Labels** (`.name`): Symbolic addresses resolved at parse time
- `.label:` - define label at current position
- `JUMP .loop` - jump to label
- `MAKE_FUNCTION (x) .body` - function body at label

**Variable names**: Plain identifiers (supports Unicode and emoji!)
- `LOAD counter` - load variable
- `STORE result` - store variable
- `LOAD 💎` - load emoji variable
- `STORE 変数` - store Unicode variable

**Constants**: Literals added to constants pool
- Numbers: `PUSH 42`, `PUSH 3.14`
- Strings: `PUSH "hello"` or `PUSH 'world'`
- Booleans: `PUSH true`, `PUSH false`
- Null: `PUSH null`

## Array Format

The programmatic array format uses TypeScript tuples for type safety:

```typescript
import { toBytecode, run } from "#reef"

const bytecode = toBytecode([
  ["PUSH", 42],        // Atom values: number | string | boolean | null
  ["STORE", "x"],      // Variable names as strings
  ["LOAD", "x"],
  ["HALT"]
])

const result = await run(bytecode)
```

### Operand Types in Array Format

**Atoms** (`number | string | boolean | null`): Constants for PUSH
```typescript
["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]
```

**Variable names**: String identifiers
```typescript
["LOAD", "counter"]
["STORE", "result"]
```

**Label definitions**: Single-element arrays starting with `.` and ending with `:`
```typescript
[".loop:"]
[".end:"]
[".function_body:"]
```

**Label references**: Strings in jump/function instructions
```typescript
["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]
```

**Counts**: Numbers for array/dict construction
```typescript
["MAKE_ARRAY", 3]    // Pop 3 items
["MAKE_DICT", 2]     // Pop 2 key-value pairs
```

### Functions in Array Format

```typescript
// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]

// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]

// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]

// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]

// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]
```

### Complete Example

```typescript
const factorial = toBytecode([
  ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
  ["STORE", "factorial"],
  ["JUMP", ".main"],

  [".fact:"],
  ["LOAD", "n"],
  ["PUSH", 0],
  ["LTE"],
  ["JUMP_IF_FALSE", ".recurse"],
  ["LOAD", "acc"],
  ["RETURN"],

  [".recurse:"],
  ["LOAD", "factorial"],
  ["LOAD", "n"],
  ["PUSH", 1],
  ["SUB"],
  ["LOAD", "n"],
  ["LOAD", "acc"],
  ["MUL"],
  ["PUSH", 2],
  ["PUSH", 0],
  ["TAIL_CALL"],

  [".main:"],
  ["LOAD", "factorial"],
  ["PUSH", 5],
  ["PUSH", 1],
  ["PUSH", 0],
  ["CALL"],
  ["HALT"]
])

const result = await run(factorial)  // { type: "number", value: 120 }
```

## String Format

### Functions
```
MAKE_FUNCTION (x y) .body       ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body  ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both
```

### Function Calls
Stack order (bottom to top):
```
LOAD fn
PUSH arg1           ; Positional args
PUSH arg2
PUSH "name"         ; Named arg key
PUSH "value"        ; Named arg value
PUSH 2              ; Positional count
PUSH 1              ; Named count
CALL
```

## Opcodes

### Stack
- `PUSH <const>` - Push constant
- `POP` - Remove top
- `DUP` - Duplicate top

### Variables
- `LOAD <name>` - Push variable value (throws if not found)
- `TRY_LOAD <name>` - Push variable value if found, otherwise push name as string (never throws)
- `STORE <name>` - Pop and store in variable

### Arithmetic
- `ADD`, `SUB`, `MUL`, `DIV`, `MOD` - Binary ops (pop 2, push result)

### Comparison
- `EQ`, `NEQ`, `LT`, `GT`, `LTE`, `GTE` - Pop 2, push boolean

### Logic
- `NOT` - Pop 1, push !value

### Control Flow
- `JUMP .label` - Unconditional jump
- `JUMP_IF_FALSE .label` - Jump if top is false or null (pops value)
- `JUMP_IF_TRUE .label` - Jump if top is truthy (pops value)
- `HALT` - Stop execution of the program

### Functions
- `MAKE_FUNCTION (params) .body` - Create function, push to stack
- `CALL` - Call function (see calling convention above)
- `TAIL_CALL` - Tail-recursive call (no stack growth)
- `RETURN` - Return from function (pops return value)
- `TRY_CALL <name>` - Call function (if found), push value (if exists), or push name as string (if not found)
- `BREAK` - Exit iterator/loop (unwinds to break target)

### Arrays
- `MAKE_ARRAY #N` - Pop N items, push array
- `ARRAY_GET` - Pop index and array, push element
- `ARRAY_SET` - Pop value, index, array; mutate array
- `ARRAY_PUSH` - Pop value and array, append to array
- `ARRAY_LEN` - Pop array, push length

### Dicts
- `MAKE_DICT #N` - Pop N key-value pairs, push dict
- `DICT_GET` - Pop key and dict, push value (or null)
- `DICT_SET` - Pop value, key, dict; mutate dict
- `DICT_HAS` - Pop key and dict, push boolean

### Unified Access
- `DOT_GET` - Pop index/key and array/dict, push value (null if missing)

### Strings
- `STR_CONCAT #N` - Pop N values, convert to strings, concatenate, push result

### Exceptions
- `PUSH_TRY .catch` - Register exception handler
- `PUSH_FINALLY .finally` - Add finally to current handler
- `POP_TRY` - Remove handler (try succeeded)
- `THROW` - Throw exception (pops error value)

## Compiler Patterns

### Function Definitions

When defining functions, you must prevent the PC from "falling through" into the function body during sequential execution. There are two standard patterns:

**Pattern 1: JUMP over function bodies (Recommended)**
```
MAKE_FUNCTION (params) .body
STORE function_name
JUMP .end              ; Skip over function body
.body:
  <function code>
  RETURN
.end:
  <continue with program>
```

**Pattern 2: Function bodies after HALT**
```
MAKE_FUNCTION (params) .body
STORE function_name
<use the function>
HALT                   ; Stop execution before function bodies
.body:
  <function code>
  RETURN
```

**Important**: Pattern 2 only works if you HALT before reaching function bodies. Pattern 1 is more flexible and required for:
- Defining multiple functions before using them
- REPL mode (incremental execution)
- Any case where execution continues after defining a function

**Why?** `MAKE_FUNCTION` creates a function value but doesn't jump to the body—it just stores the body's address. Without JUMP or HALT, the PC increments into the function body and executes it as top-level code.

### If-Else
```
<condition>
JUMP_IF_FALSE .else
  <then-block>
  JUMP .end
.else:
  <else-block>
.end:
```

### While Loop
```
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  JUMP .loop
.end:
```

### For Loop
```
<init>
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <body>
  <increment>
  JUMP .loop
.end:
```

### Continue
No CONTINUE opcode. Use backward jump to loop start:
```
.loop:
  <condition>
  JUMP_IF_FALSE .end
  <early-check>
  JUMP_IF_TRUE .loop    ; continue
  <body>
  JUMP .loop
.end:
```

### Break in Loop
Mark iterator function as break target, use BREAK opcode:
```
MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT

.each_body:
  <condition>
  JUMP_IF_TRUE .done
  <body>
  BREAK                  ; exits to caller
.done:
  RETURN
```

### Short-Circuit AND
```
<left>
DUP
JUMP_IF_FALSE .end      ; Short-circuit if false
POP
<right>
.end:                    ; Result on stack
```

### Short-Circuit OR
```
<left>
DUP
JUMP_IF_TRUE .end       ; Short-circuit if true
POP
<right>
.end:                    ; Result on stack
```

### Try-Catch
```
PUSH_TRY .catch
  <try-block>
  POP_TRY
  JUMP .end
.catch:
  STORE err
  <catch-block>
.end:
```

### Try-Catch-Finally
```
PUSH_TRY .catch
PUSH_FINALLY .finally
  <try-block>
  POP_TRY
  JUMP .finally         ; Compiler must generate this
.catch:
  STORE err
  <catch-block>
  JUMP .finally         ; And this
.finally:
  <finally-block>       ; Executes in both paths
.end:
```

**Important**: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.

### Closures
Functions automatically capture current scope:
```
PUSH 0
STORE counter
MAKE_FUNCTION () .increment
STORE increment_fn
JUMP .main

.increment:
  LOAD counter          ; Captured variable
  PUSH 1
  ADD
  STORE counter
  LOAD counter
  RETURN

.main:
  LOAD increment_fn
  PUSH 0
  PUSH 0
  CALL                  ; Returns 1
  POP
  LOAD increment_fn
  PUSH 0
  PUSH 0
  CALL                  ; Returns 2 (counter persists!)
  HALT
```

### Tail Recursion
Use TAIL_CALL instead of CALL for last call:
```
MAKE_FUNCTION (n acc) .factorial
STORE factorial
JUMP .main

.factorial:
  LOAD n
  PUSH 0
  LTE
  JUMP_IF_FALSE .recurse
  LOAD acc
  RETURN
.recurse:
  LOAD factorial
  LOAD n
  PUSH 1
  SUB
  LOAD n
  LOAD acc
  MUL
  PUSH 2
  PUSH 0
  TAIL_CALL             ; Reuses stack frame

.main:
  LOAD factorial
  PUSH 5
  PUSH 1
  PUSH 2
  PUSH 0
  CALL                  ; factorial(5, 1) = 120
  HALT
```

### Optional Function Calls (TRY_CALL)
Call function if defined, otherwise use value or name as string:
```
; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit

; Later: call if defined, skip if not
TRY_CALL onInit       ; Calls onInit() if it's a function
                       ; Pushes value if it exists but isn't a function
                       ; Pushes "onInit" as string if undefined

; Use with values
PUSH 42
STORE answer
TRY_CALL answer        ; Pushes 42 (not a function)

; Use with undefined
TRY_CALL unknown       ; Pushes "unknown" as string
```

**Use Cases**:
- Optional hooks/callbacks in DSLs
- Shell-like languages where unknown identifiers become strings
- Templating systems with optional transformers

### String Concatenation
Build strings from multiple values:
```
; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3           ; → "Hello World"

; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2           ; → "Name: Alice"

; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2           ; → "Result: 15"

; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5           ; → "User 42 has 3 items"
```

**Composability**: Results can be concatenated again
```
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2           ; → "Hello World!"
```

### Unified Access (DOT_GET)
DOT_GET provides a single opcode for accessing both arrays and dicts:

```
; Array access
PUSH 10
PUSH 20
PUSH 30
MAKE_ARRAY #3
PUSH 1
DOT_GET                 ; → 20

; Dict access
PUSH 'name'
PUSH 'Alice'
MAKE_DICT #1
PUSH 'name'
DOT_GET                 ; → 'Alice'
```

**Chained access**:
```
; Access dict['users'][0]['name']
LOAD dict
PUSH 'users'
DOT_GET                 ; Get users array
PUSH 0
DOT_GET                 ; Get first user
PUSH 'name'
DOT_GET                 ; Get name field
```

**With variables**:
```
LOAD data
LOAD key                ; Key can be string or number
DOT_GET                 ; Works for both array and dict
```

**Null safety**: Returns null for missing keys or out-of-bounds indices
```
MAKE_ARRAY #0
PUSH 0
DOT_GET                 ; → null (empty array)

MAKE_DICT #0
PUSH 'key'
DOT_GET                 ; → null (missing key)
```

## Key Concepts

### Truthiness
Only `null` and `false` are falsy. Everything else (including `0`, `""`, empty arrays/dicts) is truthy.

### Type Coercion

**toNumber**:
- `number` → identity
- `string` → parseFloat (or 0 if invalid)
- `boolean` → 1 (true) or 0 (false)
- `null` → 0
- Others → 0

**toString**:
- `string` → identity
- `number` → string representation
- `boolean` → "true" or "false"
- `null` → "null"
- `function` → "<function>"
- `array` → "[item, item]"
- `dict` → "{key: value, ...}"

**Arithmetic ops** (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.

**Comparison ops** (LT, GT, LTE, GTE) coerce both operands to numbers.

**Equality ops** (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.

**Note**: There is no string concatenation operator. ADD only works with numbers.

### Scope
- Variables resolved through parent scope chain
- STORE updates existing variable or creates in current scope
- Functions capture scope at definition time

### Identifiers
Variable and function parameter names support Unicode and emoji:
- Valid: `💎`, `🌟`, `変数`, `counter`, `_private`
- Invalid: Cannot start with digits, `.`, `#`, `@`, or `...`
- Invalid: Cannot contain whitespace or special chars: `;`, `()`, `[]`, `{}`, `=`, `'`, `"`

### Break Semantics
- CALL marks current frame as break target
- BREAK unwinds call stack to that target
- Used for Ruby-style iterator pattern

### Parameter Binding Priority
For function calls, parameters bound in order:
1. Positional argument (if provided)
2. Named argument (if provided and matches param name)
3. Default value (if defined)
4. Null

### Exception Handlers
- PUSH_TRY uses absolute addresses for catch blocks
- Nested try blocks form a stack
- THROW unwinds to most recent handler and jumps to finally (if present) or catch
- VM does NOT automatically jump to finally on success - compiler must generate JUMPs
- Finally execution in all cases is compiler's responsibility, not VM's

### Calling Convention
All calls (including native functions) push arguments in order:
1. Function
2. Positional args (in order)
3. Named args (key1, val1, key2, val2, ...)
4. Positional count (as number)
5. Named count (as number)
6. CALL or TAIL_CALL

Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL.

### Registering Native Functions

Native TypeScript functions are registered into the VM's scope and accessed like regular variables.

**Method 1**: Pass to `run()` or `VM` constructor
```typescript
const result = await run(bytecode, {
  add: (a: number, b: number) => a + b,
  greet: (name: string) => `Hello, ${name}!`
})

// Or with VM
const vm = new VM(bytecode, { add, greet })
```

**Method 2**: Register after construction
```typescript
const vm = new VM(bytecode)
vm.set('add', (a: number, b: number) => a + b)
await vm.run()
```

**Method 3**: Value-based functions (for full control)
```typescript
vm.setValueFunction('customOp', (a: Value, b: Value): Value => {
  return { type: 'number', value: toNumber(a) + toNumber(b) }
})
```

**Auto-wrapping**: `vm.set()` automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work.

**Usage in bytecode**:
```
; Positional arguments
LOAD add              ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2                ; positionalCount
PUSH 0                ; namedCount
CALL                  ; Call like any other function

; Named arguments
LOAD greet
PUSH "name"
PUSH "Alice"
PUSH "greeting"
PUSH "Hi"
PUSH 0                ; positionalCount
PUSH 2                ; namedCount
CALL                  ; → "Hi, Alice!"
```

**Named Arguments**: Native functions support named arguments. Parameter names are extracted from the function signature at call time, and arguments are bound using the same priority as Reef functions (named arg > positional arg > default > null).

**@named Pattern**: Parameters starting with `at` followed by an uppercase letter (e.g., `atOptions`, `atNamed`) collect unmatched named arguments:

```typescript
// Basic @named - collects all named args
vm.set('greet', (atNamed: any = {}) => {
  return `Hello, ${atNamed.name || 'World'}!`
})

// Mixed positional and @named
vm.set('configure', (name: string, atOptions: any = {}) => {
  return {
    name,
    debug: atOptions.debug || false,
    port: atOptions.port || 3000
  }
})
```

Bytecode example:
```
; Call with mixed positional and named args
LOAD configure
PUSH "myApp"        ; positional arg → name
PUSH "debug"
PUSH true
PUSH "port"
PUSH 8080
PUSH 1              ; 1 positional arg
PUSH 2              ; 2 named args (debug, port)
CALL                ; atOptions receives {debug: true, port: 8080}
```

Named arguments that match fixed parameter names are bound to those parameters. Remaining unmatched named arguments are collected into the `atXxx` parameter as a plain JavaScript object.

### Calling Functions from TypeScript

You can call both Reef and native functions from TypeScript using `vm.call()`:

```typescript
const bytecode = toBytecode(`
  MAKE_FUNCTION (name greeting="Hello") .greet
  STORE greet
  HALT

  .greet:
    LOAD greeting
    PUSH " "
    LOAD name
    PUSH "!"
    STR_CONCAT #4
    RETURN
`)

const vm = new VM(bytecode, {
  log: (msg: string) => console.log(msg)  // Native function
})
await vm.run()

// Call Reef function with positional arguments
const result1 = await vm.call('greet', 'Alice')
// Returns: "Hello Alice!"

// Call Reef function with named arguments (pass as final object)
const result2 = await vm.call('greet', 'Bob', { greeting: 'Hi' })
// Returns: "Hi Bob!"

// Call Reef function with only named arguments
const result3 = await vm.call('greet', { name: 'Carol', greeting: 'Hey' })
// Returns: "Hey Carol!"

// Call native function
await vm.call('log', 'Hello from TypeScript!')
```

**How it works**:
- `vm.call(functionName, ...args)` looks up the function (Reef or native) in the VM's scope
- For Reef functions: converts to callable JavaScript function
- For native functions: calls directly
- Arguments are automatically converted to ReefVM Values
- Returns the result (automatically converted back to JavaScript types)

**Named arguments**: Pass a plain object as the final argument to provide named arguments. If the last argument is a non-array object, it's treated as named arguments. All preceding arguments are treated as positional.

**Type conversion**: Arguments and return values are automatically converted between JavaScript types and ReefVM Values:
- Primitives: `number`, `string`, `boolean`, `null`
- Arrays: converted recursively
- Objects: converted to ReefVM dicts
- Functions: Reef functions are converted to callable JavaScript functions

### REPL Mode (Incremental Compilation)

ReefVM supports incremental bytecode execution for building REPLs. This allows you to execute code line-by-line while preserving scope and avoiding re-execution of side effects.

**The Problem**: By default, `vm.run()` resets the program counter (PC) to 0, re-executing all previous bytecode. This makes it impossible to implement a REPL where each line executes only once.

**The Solution**: Use `vm.continue()` to resume execution from where you left off:

```typescript
// Line 1: Define variable
const line1 = toBytecode([
  ["PUSH", 42],
  ["STORE", "x"]
])

const vm = new VM(line1)
await vm.run()  // Execute first line

// Line 2: Use the variable
const line2 = toBytecode([
  ["LOAD", "x"],
  ["PUSH", 10],
  ["ADD"]
])

vm.appendBytecode(line2)  // Append new bytecode with proper constant remapping
await vm.continue()       // Execute ONLY the new bytecode

// Result: 52 (42 + 10)
// The first line never re-executed!
```

**Key methods**:
- `vm.run()`: Resets PC to 0 and runs from the beginning (normal execution)
- `vm.continue()`: Continues from current PC (REPL mode)
- `vm.appendBytecode(bytecode)`: Helper that properly appends bytecode with constant index remapping

**Important**: Don't use `HALT` in REPL mode! The VM naturally stops when it runs out of instructions. Using `HALT` sets `vm.stopped = true`, which prevents `continue()` from resuming.

**Example REPL pattern**:
```typescript
const vm = new VM(toBytecode([]), { /* native functions */ })

while (true) {
  const input = await getUserInput()  // Get next line from user
  const bytecode = compileLine(input)  // Compile to bytecode (no HALT!)

  vm.appendBytecode(bytecode)  // Append to VM
  const result = await vm.continue()  // Execute only the new code

  console.log(fromValue(result))  // Show result to user
}
```

This pattern ensures:
- Variables persist between lines
- Side effects (like `echo` or function calls) only run once
- Previous bytecode never re-executes
- Scope accumulates across all lines

### Empty Stack
- RETURN with empty stack returns null
- HALT with empty stack returns null