# Reef Compiler Guide Quick reference for compiling to Reef bytecode. ## Bytecode Formats ReefVM supports two bytecode formats: 1. **String format**: Human-readable text with opcodes and operands 2. **Array format**: TypeScript arrays with typed tuples for programmatic generation Both formats are compiled using the same `toBytecode()` function. ## Bytecode Syntax ### Instructions ``` OPCODE operand ; comment ``` ### Operand Types **Immediate numbers** (`#N`): Counts or relative offsets - `MAKE_ARRAY #3` - count of 3 items - `JUMP #5` - relative offset of 5 instructions (prefer labels) - `PUSH_TRY #10` - absolute instruction index (prefer labels) **Labels** (`.name`): Symbolic addresses resolved at parse time - `.label:` - define label at current position - `JUMP .loop` - jump to label - `MAKE_FUNCTION (x) .body` - function body at label **Variable names**: Plain identifiers (supports Unicode and emoji!) - `LOAD counter` - load variable - `STORE result` - store variable - `LOAD 💎` - load emoji variable - `STORE 変数` - store Unicode variable **Constants**: Literals added to constants pool - Numbers: `PUSH 42`, `PUSH 3.14` - Strings: `PUSH "hello"` or `PUSH 'world'` - Booleans: `PUSH true`, `PUSH false` - Null: `PUSH null` **Native function names**: Registered TypeScript functions - `CALL_NATIVE print` ## Array Format The programmatic array format uses TypeScript tuples for type safety: ```typescript import { toBytecode, run } from "#reef" const bytecode = toBytecode([ ["PUSH", 42], // Atom values: number | string | boolean | null ["STORE", "x"], // Variable names as strings ["LOAD", "x"], ["HALT"] ]) const result = await run(bytecode) ``` ### Operand Types in Array Format **Atoms** (`number | string | boolean | null`): Constants for PUSH ```typescript ["PUSH", 42] ["PUSH", "hello"] ["PUSH", true] ["PUSH", null] ``` **Variable names**: String identifiers ```typescript ["LOAD", "counter"] ["STORE", "result"] ``` **Label definitions**: Single-element arrays starting with `.` and ending with `:` ```typescript [".loop:"] [".end:"] [".function_body:"] ``` **Label references**: Strings in jump/function instructions ```typescript ["JUMP", ".loop"] ["JUMP_IF_FALSE", ".end"] ["MAKE_FUNCTION", ["x", "y"], ".body"] ["PUSH_TRY", ".catch"] ``` **Counts**: Numbers for array/dict construction ```typescript ["MAKE_ARRAY", 3] // Pop 3 items ["MAKE_DICT", 2] // Pop 2 key-value pairs ``` **Native function names**: Strings for registered functions ```typescript ["CALL_NATIVE", "print"] ``` ### Functions in Array Format ```typescript // Basic function ["MAKE_FUNCTION", ["x", "y"], ".body"] // With defaults ["MAKE_FUNCTION", ["x", "y=10"], ".body"] // Variadic ["MAKE_FUNCTION", ["...args"], ".body"] // Named args ["MAKE_FUNCTION", ["@opts"], ".body"] // Mixed ["MAKE_FUNCTION", ["x", "y=5", "...rest", "@opts"], ".body"] ``` ### Complete Example ```typescript const factorial = toBytecode([ ["MAKE_FUNCTION", ["n", "acc=1"], ".fact"], ["STORE", "factorial"], ["JUMP", ".main"], [".fact:"], ["LOAD", "n"], ["PUSH", 0], ["LTE"], ["JUMP_IF_FALSE", ".recurse"], ["LOAD", "acc"], ["RETURN"], [".recurse:"], ["LOAD", "factorial"], ["LOAD", "n"], ["PUSH", 1], ["SUB"], ["LOAD", "n"], ["LOAD", "acc"], ["MUL"], ["PUSH", 2], ["PUSH", 0], ["TAIL_CALL"], [".main:"], ["LOAD", "factorial"], ["PUSH", 5], ["PUSH", 1], ["PUSH", 0], ["CALL"], ["HALT"] ]) const result = await run(factorial) // { type: "number", value: 120 } ``` ## String Format ### Functions ``` MAKE_FUNCTION (x y) .body ; Basic MAKE_FUNCTION (x=10 y=20) .body ; Defaults MAKE_FUNCTION (x ...rest) .body ; Variadic MAKE_FUNCTION (x @named) .body ; Named args MAKE_FUNCTION (x ...rest @named) .body ; Both ``` ### Function Calls Stack order (bottom to top): ``` LOAD fn PUSH arg1 ; Positional args PUSH arg2 PUSH "name" ; Named arg key PUSH "value" ; Named arg value PUSH 2 ; Positional count PUSH 1 ; Named count CALL ``` ## Opcodes ### Stack - `PUSH ` - Push constant - `POP` - Remove top - `DUP` - Duplicate top ### Variables - `LOAD ` - Push variable value - `STORE ` - Pop and store in variable ### Arithmetic - `ADD`, `SUB`, `MUL`, `DIV`, `MOD` - Binary ops (pop 2, push result) ### Comparison - `EQ`, `NEQ`, `LT`, `GT`, `LTE`, `GTE` - Pop 2, push boolean ### Logic - `NOT` - Pop 1, push !value ### Control Flow - `JUMP .label` - Unconditional jump - `JUMP_IF_FALSE .label` - Jump if top is false or null (pops value) - `JUMP_IF_TRUE .label` - Jump if top is truthy (pops value) - `HALT` - Stop execution of the program ### Functions - `MAKE_FUNCTION (params) .body` - Create function, push to stack - `CALL` - Call function (see calling convention above) - `TAIL_CALL` - Tail-recursive call (no stack growth) - `RETURN` - Return from function (pops return value) - `BREAK` - Exit iterator/loop (unwinds to break target) ### Arrays - `MAKE_ARRAY #N` - Pop N items, push array - `ARRAY_GET` - Pop index and array, push element - `ARRAY_SET` - Pop value, index, array; mutate array - `ARRAY_PUSH` - Pop value and array, append to array - `ARRAY_LEN` - Pop array, push length ### Dicts - `MAKE_DICT #N` - Pop N key-value pairs, push dict - `DICT_GET` - Pop key and dict, push value (or null) - `DICT_SET` - Pop value, key, dict; mutate dict - `DICT_HAS` - Pop key and dict, push boolean ### Exceptions - `PUSH_TRY .catch` - Register exception handler - `PUSH_FINALLY .finally` - Add finally to current handler - `POP_TRY` - Remove handler (try succeeded) - `THROW` - Throw exception (pops error value) ### Native - `CALL_NATIVE ` - Call registered TypeScript function (consumes entire stack as args) ## Compiler Patterns ### If-Else ``` JUMP_IF_FALSE .else JUMP .end .else: .end: ``` ### While Loop ``` .loop: JUMP_IF_FALSE .end JUMP .loop .end: ``` ### For Loop ``` .loop: JUMP_IF_FALSE .end JUMP .loop .end: ``` ### Continue No CONTINUE opcode. Use backward jump to loop start: ``` .loop: JUMP_IF_FALSE .end JUMP_IF_TRUE .loop ; continue JUMP .loop .end: ``` ### Break in Loop Mark iterator function as break target, use BREAK opcode: ``` MAKE_FUNCTION () .each_body STORE each LOAD collection LOAD each HALT .each_body: JUMP_IF_TRUE .done BREAK ; exits to caller .done: RETURN ``` ### Short-Circuit AND ``` DUP JUMP_IF_FALSE .end ; Short-circuit if false POP .end: ; Result on stack ``` ### Short-Circuit OR ``` DUP JUMP_IF_TRUE .end ; Short-circuit if true POP .end: ; Result on stack ``` ### Try-Catch ``` PUSH_TRY .catch POP_TRY JUMP .end .catch: STORE err .end: ``` ### Try-Catch-Finally ``` PUSH_TRY .catch PUSH_FINALLY .finally POP_TRY JUMP .finally ; Compiler must generate this .catch: STORE err JUMP .finally ; And this .finally: ; Executes in both paths .end: ``` **Important**: VM only auto-jumps to finally on THROW. For successful try/catch, compiler must explicitly JUMP to finally. ### Closures Functions automatically capture current scope: ``` PUSH 0 STORE counter MAKE_FUNCTION () .increment RETURN .increment: LOAD counter ; Captured variable PUSH 1 ADD STORE counter LOAD counter RETURN ``` ### Tail Recursion Use TAIL_CALL instead of CALL for last call: ``` MAKE_FUNCTION (n acc) .factorial STORE factorial <...> .factorial: LOAD n PUSH 0 LTE JUMP_IF_FALSE .recurse LOAD acc RETURN .recurse: LOAD factorial LOAD n PUSH 1 SUB LOAD n LOAD acc MUL PUSH 2 PUSH 0 TAIL_CALL ; Reuses stack frame ``` ## Key Concepts ### Truthiness Only `null` and `false` are falsy. Everything else (including `0`, `""`, empty arrays/dicts) is truthy. ### Type Coercion **toNumber**: - `number` → identity - `string` → parseFloat (or 0 if invalid) - `boolean` → 1 (true) or 0 (false) - `null` → 0 - Others → 0 **toString**: - `string` → identity - `number` → string representation - `boolean` → "true" or "false" - `null` → "null" - `function` → "" - `array` → "[item, item]" - `dict` → "{key: value, ...}" **Arithmetic ops** (ADD, SUB, MUL, DIV, MOD) coerce both operands to numbers. **Comparison ops** (LT, GT, LTE, GTE) coerce both operands to numbers. **Equality ops** (EQ, NEQ) use type-aware comparison with deep equality for arrays/dicts. **Note**: There is no string concatenation operator. ADD only works with numbers. ### Scope - Variables resolved through parent scope chain - STORE updates existing variable or creates in current scope - Functions capture scope at definition time ### Identifiers Variable and function parameter names support Unicode and emoji: - Valid: `💎`, `🌟`, `変数`, `counter`, `_private` - Invalid: Cannot start with digits, `.`, `#`, `@`, or `...` - Invalid: Cannot contain whitespace or special chars: `;`, `()`, `[]`, `{}`, `=`, `'`, `"` ### Break Semantics - CALL marks current frame as break target - BREAK unwinds call stack to that target - Used for Ruby-style iterator pattern ### Parameter Binding Priority For function calls, parameters bound in order: 1. Positional argument (if provided) 2. Named argument (if provided and matches param name) 3. Default value (if defined) 4. Null ### Exception Handlers - PUSH_TRY uses absolute addresses for catch blocks - Nested try blocks form a stack - THROW unwinds to most recent handler and jumps to finally (if present) or catch - VM does NOT automatically jump to finally on success - compiler must generate JUMPs - Finally execution in all cases is compiler's responsibility, not VM's ### Calling Convention All calls push arguments in order: 1. Function 2. Positional args (in order) 3. Named args (key1, val1, key2, val2, ...) 4. Positional count (as number) 5. Named count (as number) 6. CALL or TAIL_CALL ### CALL_NATIVE Behavior Unlike CALL, CALL_NATIVE consumes the **entire stack** as arguments and clears the stack. The native function receives all values that were on the stack at the time of the call. ### Empty Stack - RETURN with empty stack returns null - HALT with empty stack returns null