# Reef Compiler Guide Quick reference for compiling to Reef bytecode. ## Bytecode Formats ReefVM supports two bytecode formats: 1. **String format**: Human-readable text with opcodes and operands 2. **Array format**: TypeScript arrays with typed tuples for programmatic generation Both formats are compiled using the same `toBytecode()` function. ## Bytecode Syntax ### Instructions ``` OPCODE operand ; comment ``` ### Operand Types **Immediate numbers** (`#N`): Counts or relative offsets - `MAKE_ARRAY #3` - count of 3 items - `JUMP #5` - relative offset of 5 instructions (prefer labels) - `PUSH_TRY #10` - absolute instruction index (prefer labels) **Labels** (`.name`): Symbolic addresses resolved at parse time - `.label:` - define label at current position - `JUMP .loop` - jump to label - `MAKE_FUNCTION (x) .body` - function body at label **Variable names**: Plain identifiers (supports Unicode and emoji!) - `LOAD counter` - load variable - `STORE result` - store variable - `LOAD 💎` - load emoji variable - `STORE 変数` - store Unicode variable **Constants**: Literals added to constants pool - Numbers: `PUSH 42`, `PUSH 3.14` - Strings: `PUSH "hello"` or `PUSH 'world'` - Booleans: `PUSH true`, `PUSH false` - Null: `PUSH null` ## Array Format The programmatic array format uses TypeScript tuples for type safety: ```typescript import { toBytecode, run } from "#reef" const bytecode = toBytecode([ ["PUSH", 42], // Atom values: number | string | boolean | null ["STORE", "x"], // Variable names as strings ["LOAD", "x"], ["HALT"] ]) const result = await run(bytecode) ``` ### Operand Types in Array Format **Atoms** (`number | string | boolean | null`): Constants for PUSH ```typescript ["PUSH", 42] ["PUSH", "hello"] ["PUSH", true] ["PUSH", null] ``` **Variable names**: String identifiers ```typescript ["LOAD", "counter"] ["STORE", "result"] ``` **Label definitions**: Single-element arrays starting with `.` and ending with `:` ```typescript [".loop:"] [".end:"] [".function_body:"] ``` **Label references**: Strings in jump/function instructions ```typescript ["JUMP", ".loop"] ["JUMP_IF_FALSE", ".end"] ["MAKE_FUNCTION", ["x", "y"], ".body"] ["PUSH_TRY", ".catch"] ``` **Counts**: Numbers for array/dict construction ```typescript ["MAKE_ARRAY", 3] // Pop 3 items ["MAKE_DICT", 2] // Pop 2 key-value pairs ``` ### Functions in Array Format ```typescript // Basic function ["MAKE_FUNCTION", ["x", "y"], ".body"] // With defaults ["MAKE_FUNCTION", ["x", "y=10"], ".body"] // Variadic ["MAKE_FUNCTION", ["...args"], ".body"] // Named args ["MAKE_FUNCTION", ["@opts"], ".body"] // Mixed ["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"] ``` ### Complete Example ```typescript const factorial = toBytecode([ ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"], ["STORE", "factorial"], ["JUMP", ".main"], [".fact:"], ["LOAD", "n"], ["PUSH", 0], ["LTE"], ["JUMP_IF_FALSE", ".recurse"], ["LOAD", "acc"], ["RETURN"], [".recurse:"], ["LOAD", "factorial"], ["LOAD", "n"], ["PUSH", 1], ["SUB"], ["LOAD", "n"], ["LOAD", "acc"], ["MUL"], ["PUSH", 2], ["PUSH", 0], ["TAIL_CALL"], [".main:"], ["LOAD", "factorial"], ["PUSH", 5], ["PUSH", 1], ["PUSH", 0], ["CALL"], ["HALT"] ]) const result = await run(factorial) // { type: "number", value: 120 } ``` ## String Format ### Functions ``` MAKE_FUNCTION (x y) .body ; Basic MAKE_FUNCTION (x=10 y=20) .body ; Defaults MAKE_FUNCTION (x ...rest) .body ; Variadic MAKE_FUNCTION (x @named) .body ; Named args MAKE_FUNCTION (x ...rest @named) .body ; Both ``` ### Function Calls Stack order (bottom to top): ``` LOAD fn PUSH arg1 ; Positional args PUSH arg2 PUSH "name" ; Named arg key PUSH "value" ; Named arg value PUSH 2 ; Positional count PUSH 1 ; Named count CALL ``` ## Opcodes ### Stack - `PUSH ` - Push constant - `POP` - Remove top - `DUP` - Duplicate top ### Variables - `LOAD ` - Push variable value (throws if not found) - `TRY_LOAD ` - Push variable value if found, otherwise push name as string (never throws) - `STORE ` - Pop and store in variable ### Arithmetic - `ADD`, `SUB`, `MUL`, `DIV`, `MOD` - Binary ops (pop 2, push result) ### Comparison - `EQ`, `NEQ`, `LT`, `GT`, `LTE`, `GTE` - Pop 2, push boolean ### Logic - `NOT` - Pop 1, push !value ### Control Flow - `JUMP .label` - Unconditional jump - `JUMP_IF_FALSE .label` - Jump if top is false or null (pops value) - `JUMP_IF_TRUE .label` - Jump if top is truthy (pops value) - `HALT` - Stop execution of the program ### Functions - `MAKE_FUNCTION (params) .body` - Create function, push to stack - `CALL` - Call function (see calling convention above) - `TAIL_CALL` - Tail-recursive call (no stack growth) - `RETURN` - Return from function (pops return value) - `TRY_CALL ` - Call function (if found), push value (if exists), or push name as string (if not found) - `BREAK` - Exit iterator/loop (unwinds to break target) ### Arrays - `MAKE_ARRAY #N` - Pop N items, push array - `ARRAY_GET` - Pop index and array, push element - `ARRAY_SET` - Pop value, index, array; mutate array - `ARRAY_PUSH` - Pop value and array, append to array - `ARRAY_LEN` - Pop array, push length ### Dicts - `MAKE_DICT #N` - Pop N key-value pairs, push dict - `DICT_GET` - Pop key and dict, push value (or null) - `DICT_SET` - Pop value, key, dict; mutate dict - `DICT_HAS` - Pop key and dict, push boolean ### Unified Access - `DOT_GET` - Pop index/key and array/dict, push value (null if missing) ### Strings - `STR_CONCAT #N` - Pop N values, convert to strings, concatenate, push result ### Exceptions - `PUSH_TRY .catch` - Register exception handler - `PUSH_FINALLY .finally` - Add finally to current handler - `POP_TRY` - Remove handler (try succeeded) - `THROW` - Throw exception (pops error value) ## Compiler Patterns ### If-Else ``` JUMP_IF_FALSE .else JUMP .end .else: .end: ``` ### While Loop ``` .loop: JUMP_IF_FALSE .end JUMP .loop .end: ``` ### For Loop ``` .loop: JUMP_IF_FALSE .end JUMP .loop .end: ``` ### Continue No CONTINUE opcode. Use backward jump to loop start: ``` .loop: JUMP_IF_FALSE .end JUMP_IF_TRUE .loop ; continue JUMP .loop .end: ``` ### Break in Loop Mark iterator function as break target, use BREAK opcode: ``` MAKE_FUNCTION () .each_body STORE each LOAD collection LOAD each HALT .each_body: JUMP_IF_TRUE .done BREAK ; exits to caller .done: RETURN ``` ### Short-Circuit AND ``` DUP JUMP_IF_FALSE .end ; Short-circuit if false POP .end: ; Result on stack ``` ### Short-Circuit OR ``` DUP JUMP_IF_TRUE .end ; Short-circuit if true POP .end: ; Result on stack ``` ### Try-Catch ``` PUSH_TRY .catch POP_TRY JUMP .end .catch: STORE err .end: ``` ### Try-Catch-Finally ``` PUSH_TRY .catch PUSH_FINALLY .finally POP_TRY JUMP .finally ; Compiler must generate this .catch: STORE err JUMP .finally ; And this .finally: ; Executes in both paths .end: ``` **Important**: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally. ### Closures Functions automatically capture current scope: ``` PUSH 0 STORE counter MAKE_FUNCTION () .increment RETURN .increment: LOAD counter ; Captured variable PUSH 1 ADD STORE counter LOAD counter RETURN ``` ### Tail Recursion Use TAIL_CALL instead of CALL for last call: ``` MAKE_FUNCTION (n acc) .factorial STORE factorial <...> .factorial: LOAD n PUSH 0 LTE JUMP_IF_FALSE .recurse LOAD acc RETURN .recurse: LOAD factorial LOAD n PUSH 1 SUB LOAD n LOAD acc MUL PUSH 2 PUSH 0 TAIL_CALL ; Reuses stack frame ``` ### Optional Function Calls (TRY_CALL) Call function if defined, otherwise use value or name as string: ``` ; Define optional hook MAKE_FUNCTION () .onInit STORE onInit ; Later: call if defined, skip if not TRY_CALL onInit ; Calls onInit() if it's a function ; Pushes value if it exists but isn't a function ; Pushes "onInit" as string if undefined ; Use with values PUSH 42 STORE answer TRY_CALL answer ; Pushes 42 (not a function) ; Use with undefined TRY_CALL unknown ; Pushes "unknown" as string ``` **Use Cases**: - Optional hooks/callbacks in DSLs - Shell-like languages where unknown identifiers become strings - Templating systems with optional transformers ### String Concatenation Build strings from multiple values: ``` ; Simple concatenation PUSH "Hello" PUSH " " PUSH "World" STR_CONCAT #3 ; → "Hello World" ; With variables PUSH "Name: " LOAD userName STR_CONCAT #2 ; → "Name: Alice" ; With expressions and type coercion PUSH "Result: " PUSH 10 PUSH 5 ADD STR_CONCAT #2 ; → "Result: 15" ; Template-like interpolation PUSH "User " LOAD userId PUSH " has " LOAD count PUSH " items" STR_CONCAT #5 ; → "User 42 has 3 items" ``` **Composability**: Results can be concatenated again ``` PUSH "Hello" PUSH " " PUSH "World" STR_CONCAT #3 PUSH "!" STR_CONCAT #2 ; → "Hello World!" ``` ### Unified Access (DOT_GET) DOT_GET provides a single opcode for accessing both arrays and dicts: ``` ; Array access PUSH 10 PUSH 20 PUSH 30 MAKE_ARRAY #3 PUSH 1 DOT_GET ; → 20 ; Dict access PUSH 'name' PUSH 'Alice' MAKE_DICT #1 PUSH 'name' DOT_GET ; → 'Alice' ``` **Chained access**: ``` ; Access dict['users'][0]['name'] LOAD dict PUSH 'users' DOT_GET ; Get users array PUSH 0 DOT_GET ; Get first user PUSH 'name' DOT_GET ; Get name field ``` **With variables**: ``` LOAD data LOAD key ; Key can be string or number DOT_GET ; Works for both array and dict ``` **Null safety**: Returns null for missing keys or out-of-bounds indices ``` MAKE_ARRAY #0 PUSH 0 DOT_GET ; → null (empty array) MAKE_DICT #0 PUSH 'key' DOT_GET ; → null (missing key) ``` ## Key Concepts ### Truthiness Only `null` and `false` are falsy. Everything else (including `0`, `""`, empty arrays/dicts) is truthy. ### Type Coercion **toNumber**: - `number` → identity - `string` → parseFloat (or 0 if invalid) - `boolean` → 1 (true) or 0 (false) - `null` → 0 - Others → 0 **toString**: - `string` → identity - `number` → string representation - `boolean` → "true" or "false" - `null` → "null" - `function` → "" - `array` → "[item, item]" - `dict` → "{key: value, ...}" **Arithmetic ops** (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers. **Comparison ops** (LT, GT, LTE, GTE) coerce both operands to numbers. **Equality ops** (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts. **Note**: There is no string concatenation operator. ADD only works with numbers. ### Scope - Variables resolved through parent scope chain - STORE updates existing variable or creates in current scope - Functions capture scope at definition time ### Identifiers Variable and function parameter names support Unicode and emoji: - Valid: `💎`, `🌟`, `変数`, `counter`, `_private` - Invalid: Cannot start with digits, `.`, `#`, `@`, or `...` - Invalid: Cannot contain whitespace or special chars: `;`, `()`, `[]`, `{}`, `=`, `'`, `"` ### Break Semantics - CALL marks current frame as break target - BREAK unwinds call stack to that target - Used for Ruby-style iterator pattern ### Parameter Binding Priority For function calls, parameters bound in order: 1. Positional argument (if provided) 2. Named argument (if provided and matches param name) 3. Default value (if defined) 4. Null ### Exception Handlers - PUSH_TRY uses absolute addresses for catch blocks - Nested try blocks form a stack - THROW unwinds to most recent handler and jumps to finally (if present) or catch - VM does NOT automatically jump to finally on success - compiler must generate JUMPs - Finally execution in all cases is compiler's responsibility, not VM's ### Calling Convention All calls (including native functions) push arguments in order: 1. Function 2. Positional args (in order) 3. Named args (key1, val1, key2, val2, ...) 4. Positional count (as number) 5. Named count (as number) 6. CALL or TAIL_CALL Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL. ### Registering Native Functions Native TypeScript functions are registered into the VM's scope and accessed like regular variables. **Method 1**: Pass to `run()` or `VM` constructor ```typescript const result = await run(bytecode, { add: (a: number, b: number) => a + b, greet: (name: string) => `Hello, ${name}!` }) // Or with VM const vm = new VM(bytecode, { add, greet }) ``` **Method 2**: Register after construction ```typescript const vm = new VM(bytecode) vm.registerFunction('add', (a: number, b: number) => a + b) await vm.run() ``` **Method 3**: Value-based functions (for full control) ```typescript vm.registerValueFunction('customOp', (a: Value, b: Value): Value => { return { type: 'number', value: toNumber(a) + toNumber(b) } }) ``` **Auto-wrapping**: `registerFunction` automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work. **Usage in bytecode**: ``` ; Positional arguments LOAD add ; Load native function from scope PUSH 5 PUSH 10 PUSH 2 ; positionalCount PUSH 0 ; namedCount CALL ; Call like any other function ; Named arguments LOAD greet PUSH "name" PUSH "Alice" PUSH "greeting" PUSH "Hi" PUSH 0 ; positionalCount PUSH 2 ; namedCount CALL ; → "Hi, Alice!" ``` **Named Arguments**: Native functions support named arguments. Parameter names are extracted from the function signature at call time, and arguments are bound using the same priority as Reef functions (named arg > positional arg > default > null). **@named Pattern**: Parameters starting with `at` followed by an uppercase letter (e.g., `atOptions`, `atNamed`) collect unmatched named arguments: ```typescript // Basic @named - collects all named args vm.registerFunction('greet', (atNamed: any = {}) => { return `Hello, ${atNamed.name || 'World'}!` }) // Mixed positional and @named vm.registerFunction('configure', (name: string, atOptions: any = {}) => { return { name, debug: atOptions.debug || false, port: atOptions.port || 3000 } }) ``` Bytecode example: ``` ; Call with mixed positional and named args LOAD configure PUSH "myApp" ; positional arg → name PUSH "debug" PUSH true PUSH "port" PUSH 8080 PUSH 1 ; 1 positional arg PUSH 2 ; 2 named args (debug, port) CALL ; atOptions receives {debug: true, port: 8080} ``` Named arguments that match fixed parameter names are bound to those parameters. Remaining unmatched named arguments are collected into the `atXxx` parameter as a plain JavaScript object. ### Calling Functions from TypeScript You can call both Reef and native functions from TypeScript using `vm.call()`: ```typescript const bytecode = toBytecode(` MAKE_FUNCTION (name greeting="Hello") .greet STORE greet HALT .greet: LOAD greeting PUSH " " LOAD name PUSH "!" STR_CONCAT #4 RETURN `) const vm = new VM(bytecode, { log: (msg: string) => console.log(msg) // Native function }) await vm.run() // Call Reef function with positional arguments const result1 = await vm.call('greet', 'Alice') // Returns: "Hello Alice!" // Call Reef function with named arguments (pass as final object) const result2 = await vm.call('greet', 'Bob', { greeting: 'Hi' }) // Returns: "Hi Bob!" // Call Reef function with only named arguments const result3 = await vm.call('greet', { name: 'Carol', greeting: 'Hey' }) // Returns: "Hey Carol!" // Call native function await vm.call('log', 'Hello from TypeScript!') ``` **How it works**: - `vm.call(functionName, ...args)` looks up the function (Reef or native) in the VM's scope - For Reef functions: converts to callable JavaScript function - For native functions: calls directly - Arguments are automatically converted to ReefVM Values - Returns the result (automatically converted back to JavaScript types) **Named arguments**: Pass a plain object as the final argument to provide named arguments. If the last argument is a non-array object, it's treated as named arguments. All preceding arguments are treated as positional. **Type conversion**: Arguments and return values are automatically converted between JavaScript types and ReefVM Values: - Primitives: `number`, `string`, `boolean`, `null` - Arrays: converted recursively - Objects: converted to ReefVM dicts - Functions: Reef functions are converted to callable JavaScript functions ### REPL Mode (Incremental Compilation) ReefVM supports incremental bytecode execution for building REPLs. This allows you to execute code line-by-line while preserving scope and avoiding re-execution of side effects. **The Problem**: By default, `vm.run()` resets the program counter (PC) to 0, re-executing all previous bytecode. This makes it impossible to implement a REPL where each line executes only once. **The Solution**: Use `vm.continue()` to resume execution from where you left off: ```typescript // Line 1: Define variable const line1 = toBytecode([ ["PUSH", 42], ["STORE", "x"] ]) const vm = new VM(line1) await vm.run() // Execute first line // Line 2: Use the variable const line2 = toBytecode([ ["LOAD", "x"], ["PUSH", 10], ["ADD"] ]) vm.appendBytecode(line2) // Append new bytecode with proper constant remapping await vm.continue() // Execute ONLY the new bytecode // Result: 52 (42 + 10) // The first line never re-executed! ``` **Key methods**: - `vm.run()`: Resets PC to 0 and runs from the beginning (normal execution) - `vm.continue()`: Continues from current PC (REPL mode) - `vm.appendBytecode(bytecode)`: Helper that properly appends bytecode with constant index remapping **Important**: Don't use `HALT` in REPL mode! The VM naturally stops when it runs out of instructions. Using `HALT` sets `vm.stopped = true`, which prevents `continue()` from resuming. **Example REPL pattern**: ```typescript const vm = new VM(toBytecode([]), { /* native functions */ }) while (true) { const input = await getUserInput() // Get next line from user const bytecode = compileLine(input) // Compile to bytecode (no HALT!) vm.appendBytecode(bytecode) // Append to VM const result = await vm.continue() // Execute only the new code console.log(fromValue(result)) // Show result to user } ``` This pattern ensures: - Variables persist between lines - Side effects (like `echo` or function calls) only run once - Previous bytecode never re-executes - Scope accumulates across all lines ### Empty Stack - RETURN with empty stack returns null - HALT with empty stack returns null