simpler native functions

This commit is contained in:
Chris Wanstrath 2025-10-17 12:48:16 -07:00
parent 93eff53a76
commit fe7586a5fa
11 changed files with 162 additions and 121 deletions

View File

@ -42,8 +42,7 @@ No build step required - Bun runs TypeScript directly.
- Stack-based execution with program counter (PC)
- Call stack for function frames
- Exception handler stack for try/catch/finally
- Lexical scope chain with parent references
- Native function registry for TypeScript interop
- Lexical scope chain with parent references (includes native functions)
**Key subsystems**:
- **bytecode.ts**: Compiler that converts both string and array formats to executable bytecode. Handles label resolution, constant pool management, and function definition parsing. The `toBytecode()` function accepts either a string (human-readable) or typed array format (programmatic).
@ -70,7 +69,7 @@ No build step required - Bun runs TypeScript directly.
**Parameter binding priority**: Named args bind to fixed params first. Unmatched named args go to `@named` dict parameter. Fixed params bind in order: named arg > positional arg > default > null.
**Native function calling**: CALL_NATIVE consumes the entire stack as arguments (different from CALL which pops specific argument counts).
**Native function calling**: Native functions are stored in scope and called via LOAD + CALL, using the same calling convention as Reef functions. They do not support named arguments.
## Testing Strategy
@ -372,8 +371,6 @@ Run `bun test` to verify all tests pass before committing.
**MAKE_ARRAY operand**: Specifies count, not a stack index. `MAKE_ARRAY #3` pops 3 items.
**CALL_NATIVE stack behavior**: Unlike CALL, it consumes all stack values as arguments and clears the stack.
**Finally blocks**: The compiler must generate explicit JUMPs to finally blocks for successful try/catch completion. The VM only auto-jumps to finally on THROW.
**Variable scoping**: STORE updates existing variables in parent scopes or creates in current scope. It does NOT shadow by default.

View File

@ -42,9 +42,6 @@ OPCODE operand ; comment
- Booleans: `PUSH true`, `PUSH false`
- Null: `PUSH null`
**Native function names**: Registered TypeScript functions
- `CALL_NATIVE print`
## Array Format
The programmatic array format uses TypeScript tuples for type safety:
@ -99,11 +96,6 @@ const result = await run(bytecode)
["MAKE_DICT", 2] // Pop 2 key-value pairs
```
**Native function names**: Strings for registered functions
```typescript
["CALL_NATIVE", "print"]
```
### Functions in Array Format
```typescript
@ -247,9 +239,6 @@ CALL
- `POP_TRY` - Remove handler (try succeeded)
- `THROW` - Throw exception (pops error value)
### Native
- `CALL_NATIVE <name>` - Call registered TypeScript function (consumes entire stack as args)
## Compiler Patterns
### If-Else
@ -589,7 +578,7 @@ For function calls, parameters bound in order:
- Finally execution in all cases is compiler's responsibility, not VM's
### Calling Convention
All calls push arguments in order:
All calls (including native functions) push arguments in order:
1. Function
2. Positional args (in order)
3. Named args (key1, val1, key2, val2, ...)
@ -597,11 +586,12 @@ All calls push arguments in order:
5. Named count (as number)
6. CALL or TAIL_CALL
### CALL_NATIVE Behavior
Unlike CALL, CALL_NATIVE consumes the **entire stack** as arguments and clears the stack. The native function receives all values that were on the stack at the time of the call.
Native functions use the same calling convention as Reef functions. They are registered into scope and called via LOAD + CALL.
### Registering Native Functions
Native TypeScript functions are registered into the VM's scope and accessed like regular variables.
**Method 1**: Pass to `run()` or `VM` constructor
```typescript
const result = await run(bytecode, {
@ -613,14 +603,33 @@ const result = await run(bytecode, {
const vm = new VM(bytecode, { add, greet })
```
**Method 2**: Register manually
**Method 2**: Register after construction
```typescript
const vm = new VM(bytecode)
vm.registerFunction('add', (a, b) => a + b)
vm.registerFunction('add', (a: number, b: number) => a + b)
await vm.run()
```
Functions are auto-wrapped to convert between native TypeScript and ReefVM Value types. Both sync and async functions work.
**Method 3**: Value-based functions (for full control)
```typescript
vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
```
**Auto-wrapping**: `registerFunction` automatically converts between native TypeScript types and ReefVM Value types. Both sync and async functions work.
**Usage in bytecode**:
```
LOAD add ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2 ; positionalCount
PUSH 0 ; namedCount
CALL ; Call like any other function
```
**Limitations**: Native functions do not support named arguments (namedCount must be 0).
### Empty Stack
- RETURN with empty stack returns null

View File

@ -46,7 +46,8 @@ Commands: `clear`, `reset`, `exit`.
- Mixed positional and named arguments with proper priority binding
- Tail call optimization with unbounded recursion (10,000+ iterations without stack overflow)
- Exception handling (PUSH_TRY, PUSH_FINALLY, POP_TRY, THROW) with nested try/finally blocks and call stack unwinding
- Native function interop (CALL_NATIVE) with auto-wrapping for native TypeScript types
- Native function interop with auto-wrapping for native TypeScript types
- Native functions stored in scope, called via LOAD + CALL
- Pass functions directly to `run(bytecode, { fnName: fn })` or `new VM(bytecode, { fnName: fn })`
## Design Decisions

121
SPEC.md
View File

@ -13,10 +13,9 @@ The ReefVM is a stack-based bytecode virtual machine designed for the Shrimp pro
- **Value Stack**: Operand stack for computation
- **Call Stack**: Call frames for function invocations
- **Exception Handlers**: Stack of try/catch handlers
- **Scope Chain**: Linked scopes for lexical variable resolution
- **Scope Chain**: Linked scopes for lexical variable resolution (includes native functions)
- **Program Counter (PC)**: Current instruction index
- **Constants Pool**: Immutable values and function metadata
- **Native Function Registry**: External functions callable from Shrimp
### Execution Model
@ -40,6 +39,7 @@ type Value =
| { type: 'dict', value: Map<string, Value> }
| { type: 'function', params: string[], defaults: Record<string, number>,
body: number, parentScope: Scope, variadic: boolean, named: boolean }
| { type: 'native', fn: NativeFunction, value: '<function>' }
```
### Type Coercion
@ -357,15 +357,20 @@ The created function captures `currentScope` as its `parentScope`.
3. Pop named arguments (name/value pairs) from stack
4. Pop positional arguments from stack
5. Pop function from stack
6. Mark current frame (if exists) as break target (`isBreakTarget = true`)
7. Push new call frame with current PC and scope
8. Create new scope with function's parentScope as parent
9. Bind parameters:
6. **If function is native**:
- Mark current frame (if exists) as break target
- Call native function with positional args
- Push return value onto stack
- Done (skip steps 7-11)
7. Mark current frame (if exists) as break target (`isBreakTarget = true`)
8. Push new call frame with current PC and scope
9. Create new scope with function's parentScope as parent
10. Bind parameters:
- For regular functions: bind params by position, then by name, then defaults, then null
- For variadic functions: bind fixed params, collect rest into array
- For functions with `named: true`: bind fixed params by position/name, collect unmatched named args into dict
10. Set currentScope to new scope
11. Jump to function body
11. Set currentScope to new scope
12. Jump to function body
**Parameter Binding Priority** (for fixed params):
1. Named argument (if provided and matches param name)
@ -377,8 +382,9 @@ The created function captures `currentScope` as its `parentScope`.
- Named args that match fixed parameter names are bound to those params
- If the function has `named: true`, remaining named args (that don't match any fixed param) are collected into the last parameter as a dict
- This allows flexible calling: `fn(x=10, y=20, extra=30)` where `extra` goes to the named args dict
- **Native functions do not support named arguments** - if namedCount > 0 for a native function, CALL will throw an error
**Errors**: Throws if top of stack is not a function
**Errors**: Throws if top of stack is not a function (or native function)
#### TAIL_CALL
**Operand**: None
@ -606,28 +612,62 @@ STR_CONCAT #4 ; → "Count: 42, Active: true"
### TypeScript Interop
#### CALL_NATIVE
**Operand**: Function name (string)
**Effect**: Call registered TypeScript function
**Stack**: [...args] → [returnValue]
Native TypeScript functions are registered into the VM's scope and accessed via regular LOAD/CALL operations. They behave identically to Reef functions from the bytecode perspective.
**Behavior**:
1. Look up function by name in registry
2. Mark current frame (if exists) as break target
3. Await function call (native function receives arguments and returns a Value)
4. Push return value onto stack
**Notes**:
- TypeScript functions are passed the raw stack values as arguments
- They must return a valid Value
- They can be async (VM awaits them)
- Like CALL, but function is from TypeScript registry instead of stack
**Errors**: Throws if function not found
**TypeScript Function Signature**:
**Registration**:
```typescript
type TypeScriptFunction = (...args: Value[]) => Promise<Value> | Value;
const vm = new VM(bytecode, {
add: (a: number, b: number) => a + b,
greet: (name: string) => `Hello, ${name}!`
})
// Or after construction:
vm.registerFunction('multiply', (a: number, b: number) => a * b)
```
**Usage in Bytecode**:
```
LOAD add ; Load native function from scope
PUSH 5
PUSH 10
PUSH 2 ; positionalCount
PUSH 0 ; namedCount
CALL ; Call it like any other function
```
**Native Function Types**:
1. **Auto-wrapped functions** (via `registerFunction`): Accept and return native TypeScript types (number, string, boolean, array, object, etc.). The VM automatically converts between Value types and native types.
2. **Value-based functions** (via `registerValueFunction`): Accept and return `Value` types directly for full control over type handling.
**Auto-Wrapping Behavior**:
- Parameters: `Value` → native type (number, string, boolean, array, object, null, RegExp)
- Return value: native type → `Value`
- Supports sync and async functions
- Objects convert to dicts, arrays convert to Value arrays
**Limitations**:
- Native functions do not support named arguments
- If called with named arguments (namedCount > 0), CALL throws an error
**Examples**:
```typescript
// Auto-wrapped native types
vm.registerFunction('add', (a: number, b: number) => a + b)
vm.registerFunction('greet', (name: string) => `Hello, ${name}!`)
vm.registerFunction('range', (n: number) => Array.from({ length: n }, (_, i) => i))
// Value-based for custom logic
vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
// Async functions
vm.registerFunction('fetchData', async (url: string) => {
const response = await fetch(url)
return response.json()
})
```
### Special
@ -787,10 +827,9 @@ All of these should throw errors:
6. **Break Outside Loop**: BREAK with no break target
7. **Continue Outside Loop**: CONTINUE with no continue target
8. **Return Outside Function**: RETURN with no call frame
9. **Unknown Function**: CALL_NATIVE with unregistered function
10. **Mismatched Handler**: POP_TRY with no handler
11. **Invalid Constant**: PUSH with invalid constant index
12. **Invalid Function Definition**: MAKE_FUNCTION with non-function_def constant
9. **Mismatched Handler**: POP_TRY with no handler
10. **Invalid Constant**: PUSH with invalid constant index
11. **Invalid Function Definition**: MAKE_FUNCTION with non-function_def constant
## Edge Cases
@ -835,11 +874,21 @@ All of these should throw errors:
## VM Initialization
```typescript
const vm = new VM(bytecode);
vm.registerFunction('add', (a, b) => {
// Register native functions during construction
const vm = new VM(bytecode, {
add: (a: number, b: number) => a + b,
greet: (name: string) => `Hello, ${name}!`
})
// Or register after construction
vm.registerFunction('multiply', (a: number, b: number) => a * b)
// Or use Value-based functions
vm.registerValueFunction('customOp', (a: Value, b: Value): Value => {
return { type: 'number', value: toNumber(a) + toNumber(b) }
})
const result = await vm.execute()
const result = await vm.run()
```
## Testing Considerations

View File

@ -73,8 +73,8 @@ type InstructionTuple =
// Strings
| ["STR_CONCAT", number]
// Native
| ["LOAD_NATIVE", string]
// Arrays and dicts
| ["DOT_GET"]
// Special
| ["HALT"]
@ -336,7 +336,6 @@ function toBytecodeFromArray(program: ProgramItem[]): Bytecode /* throws */ {
case "STORE":
case "TRY_LOAD":
case "TRY_CALL":
case "LOAD_NATIVE":
operandValue = operand as string
break

View File

@ -65,9 +65,6 @@ export enum OpCode {
// strings
STR_CONCAT, // operand: value count (number) | stack: [val1, ..., valN] → [string] | concatenate N values
// typescript interop
LOAD_NATIVE, // operand: function name (identifier) | stack: [] → [function] | load native function
// special
HALT // operand: none | stop execution
}

View File

@ -45,7 +45,6 @@ const OPCODES_WITH_OPERANDS = new Set([
OpCode.MAKE_DICT,
OpCode.STR_CONCAT,
OpCode.MAKE_FUNCTION,
OpCode.LOAD_NATIVE,
])
const OPCODES_WITHOUT_OPERANDS = new Set([
@ -77,6 +76,7 @@ const OPCODES_WITHOUT_OPERANDS = new Set([
OpCode.DICT_GET,
OpCode.DICT_SET,
OpCode.DICT_HAS,
OpCode.DOT_GET,
])
// immediate = immediate number, eg #5

View File

@ -32,11 +32,11 @@ export class VM {
registerFunction(name: string, fn: Function) {
const wrapped = isWrapped(fn) ? fn as NativeFunction : wrapNative(fn)
this.nativeFunctions.set(name, wrapped)
this.scope.set(name, { type: 'native', fn: wrapped, value: '<function>' })
}
registerValueFunction(name: string, fn: NativeFunction) {
this.nativeFunctions.set(name, fn)
this.scope.set(name, { type: 'native', fn, value: '<function>' })
}
async run(): Promise<Value> {
@ -431,7 +431,7 @@ export class VM {
const fn = this.stack.pop()!
// Handle native functions
if (fn.type === 'native_function') {
if (fn.type === 'native') {
if (namedCount > 0)
throw new Error('CALL: native functions do not support named arguments')
@ -606,17 +606,6 @@ export class VM {
this.stack.push(returnValue)
break
case OpCode.LOAD_NATIVE: {
const functionName = instruction.operand as string
const nativeFunc = this.nativeFunctions.get(functionName)
if (!nativeFunc)
throw new Error(`LOAD_NATIVE: function not found: ${functionName}`)
this.stack.push({ type: 'native_function', fn: nativeFunc, value: '<native>' })
break
}
default:
throw `Unknown op: ${instruction.op}`
}

View File

@ -5,7 +5,7 @@ import { toBytecode } from "#bytecode"
describe("functions parameter", () => {
test("pass functions to run()", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 5
PUSH 3
PUSH 2
@ -23,7 +23,7 @@ describe("functions parameter", () => {
test("pass functions to VM constructor", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE multiply
LOAD multiply
PUSH 10
PUSH 2
PUSH 2
@ -42,14 +42,14 @@ describe("functions parameter", () => {
test("pass multiple functions", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 10
PUSH 5
PUSH 2
PUSH 0
CALL
STORE sum
LOAD_NATIVE multiply
LOAD multiply
LOAD sum
PUSH 3
PUSH 2
@ -68,7 +68,7 @@ describe("functions parameter", () => {
test("auto-wraps native functions", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE concat
LOAD concat
PUSH "hello"
PUSH "world"
PUSH 2
@ -86,7 +86,7 @@ describe("functions parameter", () => {
test("works with async functions", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE delay
LOAD delay
PUSH 100
PUSH 1
PUSH 0
@ -106,14 +106,14 @@ describe("functions parameter", () => {
test("can combine with manual registerFunction", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 5
PUSH 3
PUSH 2
PUSH 0
CALL
STORE sum
LOAD_NATIVE subtract
LOAD subtract
LOAD sum
PUSH 2
PUSH 2
@ -155,7 +155,7 @@ describe("functions parameter", () => {
test("function throws error", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE divide
LOAD divide
PUSH 0
PUSH 1
PUSH 0
@ -178,21 +178,21 @@ describe("functions parameter", () => {
test("complex workflow with multiple function calls", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 5
PUSH 3
PUSH 2
PUSH 0
CALL
STORE result
LOAD_NATIVE multiply
LOAD multiply
LOAD result
PUSH 2
PUSH 2
PUSH 0
CALL
STORE final
LOAD_NATIVE format
LOAD format
LOAD final
PUSH 1
PUSH 0
@ -211,7 +211,7 @@ describe("functions parameter", () => {
test("function overriding - later registration wins", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE getValue
LOAD getValue
PUSH 5
PUSH 1
PUSH 0

View File

@ -3,9 +3,9 @@ import { VM } from "#vm"
import { toBytecode } from "#bytecode"
import { toValue, toNumber, toString } from "#value"
test("LOAD_NATIVE - basic function call", async () => {
test("LOAD - basic function call", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 5
PUSH 10
PUSH 2
@ -24,9 +24,9 @@ test("LOAD_NATIVE - basic function call", async () => {
expect(result).toEqual({ type: 'number', value: 15 })
})
test("LOAD_NATIVE - function with string manipulation", async () => {
test("LOAD - function with string manipulation", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE concat
LOAD concat
PUSH "hello"
PUSH "world"
PUSH 2
@ -46,9 +46,9 @@ test("LOAD_NATIVE - function with string manipulation", async () => {
expect(result).toEqual({ type: 'string', value: 'hello world' })
})
test("LOAD_NATIVE - async function", async () => {
test("LOAD - async function", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE asyncDouble
LOAD asyncDouble
PUSH 42
PUSH 1
PUSH 0
@ -67,9 +67,9 @@ test("LOAD_NATIVE - async function", async () => {
expect(result).toEqual({ type: 'number', value: 84 })
})
test("LOAD_NATIVE - function with no arguments", async () => {
test("LOAD - function with no arguments", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE getAnswer
LOAD getAnswer
PUSH 0
PUSH 0
CALL
@ -85,9 +85,9 @@ test("LOAD_NATIVE - function with no arguments", async () => {
expect(result).toEqual({ type: 'number', value: 42 })
})
test("LOAD_NATIVE - function with multiple arguments", async () => {
test("LOAD - function with multiple arguments", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE sum
LOAD sum
PUSH 2
PUSH 3
PUSH 4
@ -107,9 +107,9 @@ test("LOAD_NATIVE - function with multiple arguments", async () => {
expect(result).toEqual({ type: 'number', value: 9 })
})
test("LOAD_NATIVE - function returns array", async () => {
test("LOAD - function returns array", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE makeRange
LOAD makeRange
PUSH 3
PUSH 1
PUSH 0
@ -139,9 +139,9 @@ test("LOAD_NATIVE - function returns array", async () => {
}
})
test("LOAD_NATIVE - function not found", async () => {
test("LOAD - function not found", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE nonexistent
LOAD nonexistent
PUSH 0
PUSH 0
CALL
@ -149,12 +149,12 @@ test("LOAD_NATIVE - function not found", async () => {
const vm = new VM(bytecode)
expect(vm.run()).rejects.toThrow('LOAD_NATIVE: function not found: nonexistent')
expect(vm.run()).rejects.toThrow('Undefined variable: nonexistent')
})
test("LOAD_NATIVE - using result in subsequent operations", async () => {
test("LOAD - using result in subsequent operations", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE triple
LOAD triple
PUSH 5
PUSH 1
PUSH 0
@ -175,7 +175,7 @@ test("LOAD_NATIVE - using result in subsequent operations", async () => {
test("Native function wrapping - basic sync function with native types", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE add
LOAD add
PUSH 5
PUSH 10
PUSH 2
@ -196,7 +196,7 @@ test("Native function wrapping - basic sync function with native types", async (
test("Native function wrapping - async function with native types", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE asyncDouble
LOAD asyncDouble
PUSH 42
PUSH 1
PUSH 0
@ -217,7 +217,7 @@ test("Native function wrapping - async function with native types", async () =>
test("Native function wrapping - string manipulation", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE concat
LOAD concat
PUSH "hello"
PUSH "world"
PUSH 2
@ -238,7 +238,7 @@ test("Native function wrapping - string manipulation", async () => {
test("Native function wrapping - with default parameters", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE ls
LOAD ls
PUSH "/home/user"
PUSH 1
PUSH 0
@ -258,7 +258,7 @@ test("Native function wrapping - with default parameters", async () => {
test("Native function wrapping - returns array", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE makeRange
LOAD makeRange
PUSH 3
PUSH 1
PUSH 0
@ -286,7 +286,7 @@ test("Native function wrapping - returns array", async () => {
test("Native function wrapping - returns object (becomes dict)", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE makeUser
LOAD makeUser
PUSH "Alice"
PUSH 30
PUSH 2
@ -311,13 +311,13 @@ test("Native function wrapping - returns object (becomes dict)", async () => {
test("Native function wrapping - mixed with manual Value functions", async () => {
const bytecode = toBytecode(`
LOAD_NATIVE nativeAdd
LOAD nativeAdd
PUSH 5
PUSH 1
PUSH 0
CALL
STORE sum
LOAD_NATIVE manualDouble
LOAD manualDouble
LOAD sum
PUSH 1
PUSH 0

View File

@ -387,7 +387,7 @@ describe("RegExp", () => {
test("with native functions", async () => {
const { VM } = await import("#vm")
const bytecode = toBytecode(`
LOAD_NATIVE match
LOAD match
PUSH "hello world"
PUSH /world/
PUSH 2
@ -410,7 +410,7 @@ describe("RegExp", () => {
test("native function with regex replacement", async () => {
const { VM } = await import("#vm")
const bytecode = toBytecode(`
LOAD_NATIVE replace
LOAD replace
PUSH "hello world"
PUSH /o/g
PUSH "0"
@ -433,7 +433,7 @@ describe("RegExp", () => {
test("native function extracting matches", async () => {
const { VM } = await import("#vm")
const bytecode = toBytecode(`
LOAD_NATIVE extractNumbers
LOAD extractNumbers
PUSH "test123abc456"
PUSH /\\d+/g
PUSH 2