ReefVM/GUIDE.md
Chris Wanstrath df9af925d3 STR_CONCAT #n
2025-10-14 12:14:27 -07:00

553 lines
12 KiB
Markdown

# Reef Compiler Guide
Quick reference for compiling to Reef bytecode.
## Bytecode Formats
ReefVM supports two bytecode formats:
1. **String format**: Human-readable text with opcodes and operands
2. **Array format**: TypeScript arrays with typed tuples for programmatic generation
Both formats are compiled using the same `toBytecode()` function.
## Bytecode Syntax
### Instructions
```
OPCODE operand ; comment
```
### Operand Types
**Immediate numbers** (`#N`): Counts or relative offsets
- `MAKE_ARRAY #3` - count of 3 items
- `JUMP #5` - relative offset of 5 instructions (prefer labels)
- `PUSH_TRY #10` - absolute instruction index (prefer labels)
**Labels** (`.name`): Symbolic addresses resolved at parse time
- `.label:` - define label at current position
- `JUMP .loop` - jump to label
- `MAKE_FUNCTION (x) .body` - function body at label
**Variable names**: Plain identifiers (supports Unicode and emoji!)
- `LOAD counter` - load variable
- `STORE result` - store variable
- `LOAD 💎` - load emoji variable
- `STORE 変数` - store Unicode variable
**Constants**: Literals added to constants pool
- Numbers: `PUSH 42`, `PUSH 3.14`
- Strings: `PUSH "hello"` or `PUSH 'world'`
- Booleans: `PUSH true`, `PUSH false`
- Null: `PUSH null`
**Native function names**: Registered TypeScript functions
- `CALL_NATIVE print`
## Array Format
The programmatic array format uses TypeScript tuples for type safety:
```typescript
import { toBytecode, run } from "#reef"
const bytecode = toBytecode([
["PUSH", 42], // Atom values: number | string | boolean | null
["STORE", "x"], // Variable names as strings
["LOAD", "x"],
["HALT"]
])
const result = await run(bytecode)
```
### Operand Types in Array Format
**Atoms** (`number | string | boolean | null`): Constants for PUSH
```typescript
["PUSH", 42]
["PUSH", "hello"]
["PUSH", true]
["PUSH", null]
```
**Variable names**: String identifiers
```typescript
["LOAD", "counter"]
["STORE", "result"]
```
**Label definitions**: Single-element arrays starting with `.` and ending with `:`
```typescript
[".loop:"]
[".end:"]
[".function_body:"]
```
**Label references**: Strings in jump/function instructions
```typescript
["JUMP", ".loop"]
["JUMP_IF_FALSE", ".end"]
["MAKE_FUNCTION", ["x", "y"], ".body"]
["PUSH_TRY", ".catch"]
```
**Counts**: Numbers for array/dict construction
```typescript
["MAKE_ARRAY", 3] // Pop 3 items
["MAKE_DICT", 2] // Pop 2 key-value pairs
```
**Native function names**: Strings for registered functions
```typescript
["CALL_NATIVE", "print"]
```
### Functions in Array Format
```typescript
// Basic function
["MAKE_FUNCTION", ["x", "y"], ".body"]
// With defaults
["MAKE_FUNCTION", ["x", "y=10"], ".body"]
// Variadic
["MAKE_FUNCTION", ["...args"], ".body"]
// Named args
["MAKE_FUNCTION", ["@opts"], ".body"]
// Mixed
["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"]
```
### Complete Example
```typescript
const factorial = toBytecode([
["MAKE_FUNCTION", ["n", "acc=1"], ".fact"],
["STORE", "factorial"],
["JUMP", ".main"],
[".fact:"],
["LOAD", "n"],
["PUSH", 0],
["LTE"],
["JUMP_IF_FALSE", ".recurse"],
["LOAD", "acc"],
["RETURN"],
[".recurse:"],
["LOAD", "factorial"],
["LOAD", "n"],
["PUSH", 1],
["SUB"],
["LOAD", "n"],
["LOAD", "acc"],
["MUL"],
["PUSH", 2],
["PUSH", 0],
["TAIL_CALL"],
[".main:"],
["LOAD", "factorial"],
["PUSH", 5],
["PUSH", 1],
["PUSH", 0],
["CALL"],
["HALT"]
])
const result = await run(factorial) // { type: "number", value: 120 }
```
## String Format
### Functions
```
MAKE_FUNCTION (x y) .body ; Basic
MAKE_FUNCTION (x=10 y=20) .body ; Defaults
MAKE_FUNCTION (x ...rest) .body ; Variadic
MAKE_FUNCTION (x @named) .body ; Named args
MAKE_FUNCTION (x ...rest @named) .body ; Both
```
### Function Calls
Stack order (bottom to top):
```
LOAD fn
PUSH arg1 ; Positional args
PUSH arg2
PUSH "name" ; Named arg key
PUSH "value" ; Named arg value
PUSH 2 ; Positional count
PUSH 1 ; Named count
CALL
```
## Opcodes
### Stack
- `PUSH <const>` - Push constant
- `POP` - Remove top
- `DUP` - Duplicate top
### Variables
- `LOAD <name>` - Push variable value (throws if not found)
- `TRY_LOAD <name>` - Push variable value if found, otherwise push name as string (never throws)
- `STORE <name>` - Pop and store in variable
### Arithmetic
- `ADD`, `SUB`, `MUL`, `DIV`, `MOD` - Binary ops (pop 2, push result)
### Comparison
- `EQ`, `NEQ`, `LT`, `GT`, `LTE`, `GTE` - Pop 2, push boolean
### Logic
- `NOT` - Pop 1, push !value
### Control Flow
- `JUMP .label` - Unconditional jump
- `JUMP_IF_FALSE .label` - Jump if top is false or null (pops value)
- `JUMP_IF_TRUE .label` - Jump if top is truthy (pops value)
- `HALT` - Stop execution of the program
### Functions
- `MAKE_FUNCTION (params) .body` - Create function, push to stack
- `CALL` - Call function (see calling convention above)
- `TAIL_CALL` - Tail-recursive call (no stack growth)
- `RETURN` - Return from function (pops return value)
- `TRY_CALL <name>` - Call function (if found), push value (if exists), or push name as string (if not found)
- `BREAK` - Exit iterator/loop (unwinds to break target)
### Arrays
- `MAKE_ARRAY #N` - Pop N items, push array
- `ARRAY_GET` - Pop index and array, push element
- `ARRAY_SET` - Pop value, index, array; mutate array
- `ARRAY_PUSH` - Pop value and array, append to array
- `ARRAY_LEN` - Pop array, push length
### Dicts
- `MAKE_DICT #N` - Pop N key-value pairs, push dict
- `DICT_GET` - Pop key and dict, push value (or null)
- `DICT_SET` - Pop value, key, dict; mutate dict
- `DICT_HAS` - Pop key and dict, push boolean
### Strings
- `STR_CONCAT #N` - Pop N values, convert to strings, concatenate, push result
### Exceptions
- `PUSH_TRY .catch` - Register exception handler
- `PUSH_FINALLY .finally` - Add finally to current handler
- `POP_TRY` - Remove handler (try succeeded)
- `THROW` - Throw exception (pops error value)
### Native
- `CALL_NATIVE <name>` - Call registered TypeScript function (consumes entire stack as args)
## Compiler Patterns
### If-Else
```
<condition>
JUMP_IF_FALSE .else
<then-block>
JUMP .end
.else:
<else-block>
.end:
```
### While Loop
```
.loop:
<condition>
JUMP_IF_FALSE .end
<body>
JUMP .loop
.end:
```
### For Loop
```
<init>
.loop:
<condition>
JUMP_IF_FALSE .end
<body>
<increment>
JUMP .loop
.end:
```
### Continue
No CONTINUE opcode. Use backward jump to loop start:
```
.loop:
<condition>
JUMP_IF_FALSE .end
<early-check>
JUMP_IF_TRUE .loop ; continue
<body>
JUMP .loop
.end:
```
### Break in Loop
Mark iterator function as break target, use BREAK opcode:
```
MAKE_FUNCTION () .each_body
STORE each
LOAD collection
LOAD each
<call-iterator-with-break-semantics>
HALT
.each_body:
<condition>
JUMP_IF_TRUE .done
<body>
BREAK ; exits to caller
.done:
RETURN
```
### Short-Circuit AND
```
<left>
DUP
JUMP_IF_FALSE .end ; Short-circuit if false
POP
<right>
.end: ; Result on stack
```
### Short-Circuit OR
```
<left>
DUP
JUMP_IF_TRUE .end ; Short-circuit if true
POP
<right>
.end: ; Result on stack
```
### Try-Catch
```
PUSH_TRY .catch
<try-block>
POP_TRY
JUMP .end
.catch:
STORE err
<catch-block>
.end:
```
### Try-Catch-Finally
```
PUSH_TRY .catch
PUSH_FINALLY .finally
<try-block>
POP_TRY
JUMP .finally ; Compiler must generate this
.catch:
STORE err
<catch-block>
JUMP .finally ; And this
.finally:
<finally-block> ; Executes in both paths
.end:
```
**Important**: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally.
### Closures
Functions automatically capture current scope:
```
PUSH 0
STORE counter
MAKE_FUNCTION () .increment
RETURN
.increment:
LOAD counter ; Captured variable
PUSH 1
ADD
STORE counter
LOAD counter
RETURN
```
### Tail Recursion
Use TAIL_CALL instead of CALL for last call:
```
MAKE_FUNCTION (n acc) .factorial
STORE factorial
<...>
.factorial:
LOAD n
PUSH 0
LTE
JUMP_IF_FALSE .recurse
LOAD acc
RETURN
.recurse:
LOAD factorial
LOAD n
PUSH 1
SUB
LOAD n
LOAD acc
MUL
PUSH 2
PUSH 0
TAIL_CALL ; Reuses stack frame
```
### Optional Function Calls (TRY_CALL)
Call function if defined, otherwise use value or name as string:
```
; Define optional hook
MAKE_FUNCTION () .onInit
STORE onInit
; Later: call if defined, skip if not
TRY_CALL onInit ; Calls onInit() if it's a function
; Pushes value if it exists but isn't a function
; Pushes "onInit" as string if undefined
; Use with values
PUSH 42
STORE answer
TRY_CALL answer ; Pushes 42 (not a function)
; Use with undefined
TRY_CALL unknown ; Pushes "unknown" as string
```
**Use Cases**:
- Optional hooks/callbacks in DSLs
- Shell-like languages where unknown identifiers become strings
- Templating systems with optional transformers
### String Concatenation
Build strings from multiple values:
```
; Simple concatenation
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3 ; → "Hello World"
; With variables
PUSH "Name: "
LOAD userName
STR_CONCAT #2 ; → "Name: Alice"
; With expressions and type coercion
PUSH "Result: "
PUSH 10
PUSH 5
ADD
STR_CONCAT #2 ; → "Result: 15"
; Template-like interpolation
PUSH "User "
LOAD userId
PUSH " has "
LOAD count
PUSH " items"
STR_CONCAT #5 ; → "User 42 has 3 items"
```
**Composability**: Results can be concatenated again
```
PUSH "Hello"
PUSH " "
PUSH "World"
STR_CONCAT #3
PUSH "!"
STR_CONCAT #2 ; → "Hello World!"
```
## Key Concepts
### Truthiness
Only `null` and `false` are falsy. Everything else (including `0`, `""`, empty arrays/dicts) is truthy.
### Type Coercion
**toNumber**:
- `number` → identity
- `string` → parseFloat (or 0 if invalid)
- `boolean` → 1 (true) or 0 (false)
- `null` → 0
- Others → 0
**toString**:
- `string` → identity
- `number` → string representation
- `boolean` → "true" or "false"
- `null` → "null"
- `function` → "<function>"
- `array` → "[item, item]"
- `dict` → "{key: value, ...}"
**Arithmetic ops** (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers.
**Comparison ops** (LT, GT, LTE, GTE) coerce both operands to numbers.
**Equality ops** (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts.
**Note**: There is no string concatenation operator. ADD only works with numbers.
### Scope
- Variables resolved through parent scope chain
- STORE updates existing variable or creates in current scope
- Functions capture scope at definition time
### Identifiers
Variable and function parameter names support Unicode and emoji:
- Valid: `💎`, `🌟`, `変数`, `counter`, `_private`
- Invalid: Cannot start with digits, `.`, `#`, `@`, or `...`
- Invalid: Cannot contain whitespace or special chars: `;`, `()`, `[]`, `{}`, `=`, `'`, `"`
### Break Semantics
- CALL marks current frame as break target
- BREAK unwinds call stack to that target
- Used for Ruby-style iterator pattern
### Parameter Binding Priority
For function calls, parameters bound in order:
1. Positional argument (if provided)
2. Named argument (if provided and matches param name)
3. Default value (if defined)
4. Null
### Exception Handlers
- PUSH_TRY uses absolute addresses for catch blocks
- Nested try blocks form a stack
- THROW unwinds to most recent handler and jumps to finally (if present) or catch
- VM does NOT automatically jump to finally on success - compiler must generate JUMPs
- Finally execution in all cases is compiler's responsibility, not VM's
### Calling Convention
All calls push arguments in order:
1. Function
2. Positional args (in order)
3. Named args (key1, val1, key2, val2, ...)
4. Positional count (as number)
5. Named count (as number)
6. CALL or TAIL_CALL
### CALL_NATIVE Behavior
Unlike CALL, CALL_NATIVE consumes the **entire stack** as arguments and clears the stack. The native function receives all values that were on the stack at the time of the call.
### Empty Stack
- RETURN with empty stack returns null
- HALT with empty stack returns null