Compare commits

..

10 Commits

Author SHA1 Message Date
0f7d3126a2 workin' 2025-10-19 10:18:52 -07:00
78ae96fc72 wip 2025-10-17 21:13:49 -07:00
b0d5a7f50c refactor(scope): add helper methods to ScopeContext for cleaner code 2025-10-17 19:38:32 -07:00
290270dc7b docs: add comprehensive parser architecture documentation 2025-10-17 19:15:43 -07:00
4619791b7d test: update test expectations for AssignableIdentifier token
Updated all parser and compiler tests to expect AssignableIdentifier
tokens in Assign and Params contexts instead of Identifier. Also
skipped pre-existing failing native functions test.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 19:10:40 -07:00
aee9fa0747 refactor(scope): simplify trackScope to only track AssignableIdentifier
- Update trackScope ContextTracker to use ScopeContext wrapper
- Simplify shift() to only capture AssignableIdentifier tokens
- Simplify reduce() to handle only Assign, Params, and FunctionDef
- Update hash function to use hashScope helper
- Export ScopeContext class for use in tokenizer
- Update tokenizer to access scope via ScopeContext.scope

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 18:43:11 -07:00
7de1682e91 feat(scope): add ScopeContext wrapper for pending identifiers
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 18:39:34 -07:00
2fc321596f refactor(scope): simplify Scope class, remove pending state
- Remove pendingIdentifiers and isInParams from constructor
- Fix has() method null coalescing bug
- Simplify add(), push(), pop() methods
- Remove withPendingIdentifiers, withIsInParams, clearPending methods
- Simplify hash() to only hash vars and parent (not pending state)
- Make pop() return this instead of creating new Scope when no parent

This creates a pure, hashable Scope class that only tracks variable
scope chain. Temporary state (pending identifiers) will be moved to
ScopeContext wrapper in next task.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 18:38:19 -07:00
1e6fabf954 feat(tokenizer): use canShift to emit AssignableIdentifier vs Identifier 2025-10-17 18:34:57 -07:00
b2c5db77b2 feat(parser): add AssignableIdentifier token type to grammar
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 18:33:35 -07:00
16 changed files with 913 additions and 229 deletions

View File

@ -195,6 +195,18 @@ function parseExpression(input: string) {
**Expression-oriented design**: Everything returns a value - commands, assignments, functions. This enables composition and functional patterns.
**Scope-aware property access (DotGet)**: The parser uses Lezer's `@context` feature to track variable scope at parse time. When it encounters `obj.prop`, it checks if `obj` is in scope:
- **In scope** → Parses as `DotGet(Identifier, Identifier)` → compiles to `TRY_LOAD obj; PUSH 'prop'; DOT_GET`
- **Not in scope** → Parses as `Word("obj.prop")` → compiles to `PUSH 'obj.prop'` (treated as file path/string)
Implementation files:
- **src/parser/scopeTracker.ts**: ContextTracker that maintains immutable scope chain
- **src/parser/tokenizer.ts**: External tokenizer checks `stack.context` to decide if dot creates DotGet or Word
- Scope tracking: Captures variables from assignments (`x = 5`) and function parameters (`fn x:`)
- See `src/parser/tests/dot-get.test.ts` for comprehensive examples
**Why this matters**: This enables shell-like file paths (`readme.txt`) while supporting dictionary/array access (`config.path`) without quotes, determined entirely at parse time based on lexical scope.
**EOF handling**: The grammar uses `(statement | newlineOrSemicolon)+ eof?` to handle empty lines and end-of-file without infinite loops.
## Compiler Architecture

557
docs/parser-architecture.md Normal file
View File

@ -0,0 +1,557 @@
# Shrimp Parser Architecture
This document explains the special cases, tricks, and design decisions in the Shrimp parser and tokenizer.
## Table of Contents
1. [Token Types and Their Purpose](#token-types-and-their-purpose)
2. [External Tokenizer Tricks](#external-tokenizer-tricks)
3. [Grammar Special Cases](#grammar-special-cases)
4. [Scope Tracking Architecture](#scope-tracking-architecture)
5. [Common Pitfalls](#common-pitfalls)
---
## Token Types and Their Purpose
### Four Token Types from External Tokenizer
The external tokenizer (`src/parser/tokenizer.ts`) emits four different token types based on context:
| Token | Purpose | Example |
|-------|---------|---------|
| `Identifier` | Regular identifiers in expressions, function calls | `echo`, `x` in `x + 1` |
| `AssignableIdentifier` | Identifiers on LHS of `=` or in function params | `x` in `x = 5`, params in `fn x y:` |
| `Word` | Anything else: paths, URLs, @mentions, #hashtags | `./file.txt`, `@user`, `#tag` |
| `IdentifierBeforeDot` | Identifier that's in scope, followed by `.` | `obj` in `obj.prop` |
### Why We Need Both Identifier Types
**The Problem:** At the start of a statement like `x ...`, the parser doesn't know if it's:
- An assignment: `x = 5` (needs `AssignableIdentifier`)
- A function call: `x hello world` (needs `Identifier`)
**The Solution:** The external tokenizer uses a three-way decision:
1. **Only `AssignableIdentifier` can shift** (e.g., in `Params` rule) → emit `AssignableIdentifier`
2. **Only `Identifier` can shift** (e.g., in function arguments) → emit `Identifier`
3. **Both can shift** (ambiguous statement start) → peek ahead for `=` to disambiguate
See [`Identifier vs AssignableIdentifier Disambiguation`](#identifier-vs-assignableidentifier-disambiguation) below for implementation details.
---
## External Tokenizer Tricks
### 1. Identifier vs AssignableIdentifier Disambiguation
**Location:** `src/parser/tokenizer.ts` lines 88-118
**The Challenge:** When both `Identifier` and `AssignableIdentifier` are valid (at statement start), how do we choose?
**The Solution:** Three-way branching with lookahead:
```typescript
const canAssignable = stack.canShift(AssignableIdentifier)
const canRegular = stack.canShift(Identifier)
if (canAssignable && !canRegular) {
// Only AssignableIdentifier valid (e.g., in Params)
input.acceptToken(AssignableIdentifier)
} else if (canRegular && !canAssignable) {
// Only Identifier valid (e.g., in function args)
input.acceptToken(Identifier)
} else {
// BOTH possible - peek ahead for '='
// Skip whitespace, check if next char is '='
const nextCh = getFullCodePoint(input, peekPos)
if (nextCh === 61 /* = */) {
input.acceptToken(AssignableIdentifier) // It's an assignment
} else {
input.acceptToken(Identifier) // It's a function call
}
}
```
**Key Insight:** `stack.canShift()` returns true for BOTH token types when the grammar has multiple valid paths. We can't just use `canShift()` alone - we need lookahead.
**Why This Works:**
- `fn x y: ...` → In `Params` rule, only `AssignableIdentifier` can shift → no lookahead needed
- `echo hello` → Both can shift, but no `=` ahead → emits `Identifier` → parses as `FunctionCall`
- `x = 5` → Both can shift, finds `=` ahead → emits `AssignableIdentifier` → parses as `Assign`
### 2. Surrogate Pair Handling for Emoji
**Location:** `src/parser/tokenizer.ts` lines 71-84, `getFullCodePoint()` function
**The Problem:** JavaScript strings use UTF-16, but emoji like 🍤 use code points outside the BMP (Basic Multilingual Plane), requiring surrogate pairs.
**The Solution:** When reading characters, check for high surrogates (0xD800-0xDBFF) and combine them with low surrogates (0xDC00-0xDFFF):
```typescript
const getFullCodePoint = (input: InputStream, pos: number): number => {
const ch = input.peek(pos)
// Check if this is a high surrogate (0xD800-0xDBFF)
if (ch >= 0xd800 && ch <= 0xdbff) {
const low = input.peek(pos + 1)
// Check if next is low surrogate (0xDC00-0xDFFF)
if (low >= 0xdc00 && low <= 0xdfff) {
// Combine surrogate pair into full code point
return 0x10000 + ((ch & 0x3ff) << 10) + (low & 0x3ff)
}
}
return ch
}
```
**Why This Matters:** Without this, `shrimp-🍤` would be treated as `shrimp-<high><low>` (4 characters) instead of `shrimp-🍤` (2 characters).
### 3. Context-Aware Termination for Semicolon and Colon
**Location:** `src/parser/tokenizer.ts` lines 51-57
**The Problem:** How do we parse `basename ./cool;` vs `basename ./cool; 2`?
**The Solution:** Only treat `;` and `:` as terminators if they're followed by whitespace (or EOF):
```typescript
if (canBeWord && (ch === 59 /* ; */ || ch === 58) /* : */) {
const nextCh = getFullCodePoint(input, pos + 1)
if (!isWordChar(nextCh)) break // It's a terminator
// Otherwise, continue consuming as part of the Word
}
```
**Examples:**
- `basename ./cool;``;` is followed by EOF → terminates the word at `./cool`
- `basename ./cool;2``;` is followed by `2` → included in word as `./cool;2`
- `basename ./cool; 2``;` is followed by space → terminates at `./cool`, `2` is next arg
### 4. Scope-Aware Property Access (DotGet)
**Location:** `src/parser/tokenizer.ts` lines 19-48
**The Problem:** How do we distinguish `obj.prop` (property access) from `readme.txt` (filename)?
**The Solution:** When we see a `.` after an identifier, check if that identifier is in scope:
```typescript
if (ch === 46 /* . */ && isValidIdentifier) {
// Build identifier text
let identifierText = '...' // (surrogate-pair aware)
const scopeContext = stack.context as ScopeContext | undefined
const scope = scopeContext?.scope
if (scope?.has(identifierText)) {
// In scope - stop here, emit IdentifierBeforeDot
// Grammar will parse as DotGet
input.acceptToken(IdentifierBeforeDot)
return
}
// Not in scope - continue consuming as Word
// Will parse as Word("readme.txt")
}
```
**Examples:**
- `config = {path: "..."}; config.path``config` is in scope → parses as `DotGet(IdentifierBeforeDot, Identifier)`
- `cat readme.txt``readme` is not in scope → parses as `Word("readme.txt")`
---
## Grammar Special Cases
### 1. expressionWithoutIdentifier Pattern
**Location:** `src/parser/shrimp.grammar` lines 200-210
**The Problem:** GLR conflict in `consumeToTerminator` rule:
```lezer
consumeToTerminator {
ambiguousFunctionCall | // → FunctionCallOrIdentifier → Identifier
expression // → Identifier
}
```
When parsing `my-var` at statement level, both paths want the same `Identifier` token, causing a conflict.
**The Solution:** Remove `Identifier` from the `expression` path by creating `expressionWithoutIdentifier`:
```lezer
expression {
expressionWithoutIdentifier | DotGet | Identifier
}
expressionWithoutIdentifier {
ParenExpr | Word | String | Number | Boolean | Regex | Null
}
```
Then use `expressionWithoutIdentifier` in places where we don't want bare identifiers:
```lezer
consumeToTerminator {
PipeExpr |
ambiguousFunctionCall | // ← Handles standalone identifiers
DotGet |
IfExpr |
FunctionDef |
Assign |
BinOp |
expressionWithoutIdentifier // ← No bare Identifier here
}
```
**Why This Works:** Now standalone identifiers MUST go through `ambiguousFunctionCall`, which is semantically what we want (they're either function calls or variable references).
### 2. @skip {} Wrapper for DotGet
**Location:** `src/parser/shrimp.grammar` lines 176-183
**The Problem:** DotGet needs to be whitespace-sensitive (no spaces allowed around `.`), but the global `@skip { space }` would remove them.
**The Solution:** Use `@skip {}` (empty skip) wrapper to disable automatic whitespace skipping:
```lezer
@skip {} {
DotGet {
IdentifierBeforeDot "." Identifier
}
String { "'" stringContent* "'" }
}
```
**Why This Matters:**
- `obj.prop` → Parses as `DotGet`
- `obj. prop` → Would parse as `obj` followed by `. prop` (error) if whitespace was skipped
- `obj .prop` → Would parse as `obj` followed by `.prop` (error) if whitespace was skipped
### 3. EOF Handling in item Rule
**Location:** `src/parser/shrimp.grammar` lines 54-58
**The Problem:** How do we handle empty lines and end-of-file without infinite loops?
**The Solution:** Use alternatives instead of repetition for EOF:
```lezer
item {
consumeToTerminator newlineOrSemicolon | // Statement with newline/semicolon
consumeToTerminator eof | // Statement at end of file
newlineOrSemicolon // Allow blank lines
}
```
**Why Not Just `item { (statement | newlineOrSemicolon)+ eof? }`?**
That would match EOF multiple times (once after each statement), causing parser errors. By making EOF part of an alternative, it's only matched once per item.
### 4. Params Uses AssignableIdentifier
**Location:** `src/parser/shrimp.grammar` lines 153-155
```lezer
Params {
AssignableIdentifier*
}
```
**Why This Matters:** Function parameters are in "assignable" positions - they're being bound to values when the function is called. Using `AssignableIdentifier` here:
1. Makes the grammar explicit about which identifiers create bindings
2. Enables the tokenizer to use `canShift(AssignableIdentifier)` to detect param context
3. Allows the scope tracker to only capture `AssignableIdentifier` tokens
### 5. String Interpolation Inside @skip {}
**Location:** `src/parser/shrimp.grammar` lines 181-198
**The Problem:** String contents need to preserve whitespace, but string interpolation `$identifier` needs to use the external tokenizer.
**The Solution:** Put `String` inside `@skip {}` and use the external tokenizer for `Identifier` within interpolation:
```lezer
@skip {} {
String { "'" stringContent* "'" }
}
stringContent {
StringFragment | // Matches literal text (preserves spaces)
Interpolation | // $identifier or $(expr)
EscapeSeq // \$, \n, etc.
}
Interpolation {
"$" Identifier | // Uses external tokenizer!
"$" ParenExpr
}
```
**Key Insight:** External tokenizers work inside `@skip {}` blocks! The tokenizer gets called even when skip is disabled.
---
## Scope Tracking Architecture
### Overview
Scope tracking uses Lezer's `@context` feature to maintain a scope chain during parsing. This enables:
- Distinguishing `obj.prop` (property access) from `readme.txt` (filename)
- Tracking which variables are in scope for each position in the parse tree
### Architecture: Scope vs ScopeContext
**Two-Class Design:**
```typescript
// Pure, hashable scope - only variable tracking
class Scope {
constructor(
public parent: Scope | null,
public vars: Set<string>
) {}
has(name: string): boolean
add(...names: string[]): Scope
push(): Scope // Create child scope
pop(): Scope // Return to parent
hash(): number // For incremental parsing
}
// Wrapper with temporary state
export class ScopeContext {
constructor(
public scope: Scope,
public pendingIds: string[] = []
) {}
}
```
**Why This Separation?**
1. **Scope is pure and hashable** - Only contains committed variable bindings, no temporary state
2. **ScopeContext holds temporary state** - The `pendingIds` array captures identifiers during parsing but isn't part of the hash
3. **Hash function only hashes Scope** - Incremental parsing only cares about actual scope, not pending identifiers
### How Scope Tracking Works
**1. Capture Phase (shift):**
When the parser shifts an `AssignableIdentifier` token, the scope tracker captures its text:
```typescript
shift(context, term, stack, input) {
if (term === terms.AssignableIdentifier) {
// Build text by peeking at input
let text = '...' // (read from input.pos to stack.pos)
return new ScopeContext(
context.scope,
[...context.pendingIds, text] // Append to pending
)
}
return context
}
```
**2. Commit Phase (reduce):**
When the parser reduces to `Assign` or `Params`, the scope tracker commits pending identifiers:
```typescript
reduce(context, term, stack, input) {
// Assignment: pop last identifier, add to scope
if (term === terms.Assign && context.pendingIds.length > 0) {
const varName = context.pendingIds[context.pendingIds.length - 1]!
return new ScopeContext(
context.scope.add(varName), // Add to scope
context.pendingIds.slice(0, -1) // Remove from pending
)
}
// Function params: add all identifiers, push new scope
if (term === terms.Params) {
const newScope = context.scope.push()
return new ScopeContext(
context.pendingIds.length > 0
? newScope.add(...context.pendingIds)
: newScope,
[] // Clear pending
)
}
// Function exit: pop scope
if (term === terms.FunctionDef) {
return new ScopeContext(context.scope.pop(), [])
}
return context
}
```
**3. Usage in Tokenizer:**
The tokenizer accesses scope to check if identifiers are bound:
```typescript
const scopeContext = stack.context as ScopeContext | undefined
const scope = scopeContext?.scope
if (scope?.has(identifierText)) {
// Identifier is in scope - can use in DotGet
input.acceptToken(IdentifierBeforeDot)
}
```
### Why Only Track AssignableIdentifier?
**Before (complex):**
- Tracked ALL identifiers with `term === terms.Identifier`
- Used `isInParams` flag to know which ones to keep
- Had to manually clear "stale" identifiers after DotGet, FunctionCall, etc.
**After (simple):**
- Only track `AssignableIdentifier` tokens
- These only appear in `Params` and `Assign` (by grammar design)
- No stale identifiers - they're consumed immediately
**Example:**
```shrimp
fn x y: echo x end
```
Scope tracking:
1. Shift `AssignableIdentifier("x")` → pending = ["x"]
2. Shift `AssignableIdentifier("y")` → pending = ["x", "y"]
3. Reduce `Params` → scope = {x, y}, pending = []
4. Shift `Identifier("echo")`**not captured** (not AssignableIdentifier)
5. Shift `Identifier("x")` → **not captured**
6. Reduce `FunctionDef` → pop scope
No stale identifier clearing needed!
---
## Common Pitfalls
### 1. Forgetting Surrogate Pairs
**Problem:** Using `input.peek(i)` directly gives UTF-16 code units, not Unicode code points.
**Solution:** Always use `getFullCodePoint(input, pos)` when working with emoji.
**Example:**
```typescript
// ❌ Wrong - breaks on emoji
const ch = input.peek(pos)
if (isEmoji(ch)) { ... }
// ✓ Right - handles surrogate pairs
const ch = getFullCodePoint(input, pos)
if (isEmoji(ch)) { ... }
pos += getCharSize(ch) // Advance by 1 or 2 code units
```
### 2. Adding Pending State to Hash
**Problem:** Including `pendingIds` or `isInParams` in the hash function breaks incremental parsing.
**Why?** The hash is used to determine if a cached parse tree node can be reused. If the hash includes temporary state that doesn't affect parsing decisions, nodes will be invalidated unnecessarily.
**Solution:** Only hash the `Scope` (vars + parent chain), not the `ScopeContext` wrapper.
```typescript
// ✓ Right
const hashScope = (context: ScopeContext): number => {
return context.scope.hash() // Only hash committed scope
}
// ❌ Wrong
const hashScope = (context: ScopeContext): number => {
let h = context.scope.hash()
h = (h << 5) - h + context.pendingIds.length // Don't do this!
return h
}
```
### 3. Using canShift() Alone for Disambiguation
**Problem:** `stack.canShift(AssignableIdentifier)` returns true when BOTH paths are possible (e.g., at statement start).
**Why?** The GLR parser maintains multiple parse states. If any state can shift the token, `canShift()` returns true.
**Solution:** Check BOTH token types and use lookahead when both are possible:
```typescript
const canAssignable = stack.canShift(AssignableIdentifier)
const canRegular = stack.canShift(Identifier)
if (canAssignable && canRegular) {
// Both possible - need lookahead
const hasEquals = peekForEquals(input, pos)
input.acceptToken(hasEquals ? AssignableIdentifier : Identifier)
}
```
### 4. Clearing Pending Identifiers Too Eagerly
**Problem:** In the old code, we had to clear pending identifiers after DotGet, FunctionCall, etc. to prevent state leakage. This was fragile and easy to forget.
**Why This Happened:** We were tracking ALL identifiers, not just assignable ones.
**Solution:** Only track `AssignableIdentifier` tokens. They only appear in contexts where they'll be consumed (Params, Assign), so no clearing needed.
### 5. Line Number Confusion in Edit Tool
**Problem:** The Edit tool shows line numbers with a prefix (like ` 5→`), but these aren't the real line numbers.
**How to Read:**
- The number before `→` is the actual line number
- Use that number when referencing code in comments or documentation
- Example: ` 5→export const foo` means the code is on line 5
---
## Testing Strategy
### Parser Tests
Use the `toMatchTree` helper to verify parse tree structure:
```typescript
test('assignment with AssignableIdentifier', () => {
expect('x = 5').toMatchTree(`
Assign
AssignableIdentifier x
operator =
Number 5
`)
})
```
**Key Testing Patterns:**
- Test both token type expectations (Identifier vs AssignableIdentifier)
- Test scope-aware features (DotGet for in-scope vs Word for out-of-scope)
- Test edge cases (empty lines, EOF, surrogate pairs)
### Debugging Parser Issues
1. **Check token types:** Run parser on input and examine tree structure
2. **Test canShift():** Add logging to tokenizer to see what `canShift()` returns
3. **Verify scope state:** Log scope contents during parsing
4. **Use GLR visualization:** Lezer has tools for visualizing parse states
---
## Further Reading
- [Lezer System Guide](https://lezer.codemirror.net/docs/guide/)
- [Lezer API Reference](https://lezer.codemirror.net/docs/ref/)
- [CLAUDE.md](../CLAUDE.md) - General project guidance
- [Scope Tracker Source](../src/parser/scopeTracker.ts)
- [Tokenizer Source](../src/parser/tokenizer.ts)

View File

@ -9,6 +9,7 @@ import {
getAllChildren,
getAssignmentParts,
getBinaryParts,
getDotGetParts,
getFunctionCallParts,
getFunctionDefParts,
getIfExprParts,
@ -17,8 +18,8 @@ import {
getStringParts,
} from '#compiler/utils'
// const DEBUG = false
const DEBUG = true
const DEBUG = false
// const DEBUG = true
type Label = `.${string}`
@ -189,6 +190,19 @@ export class Compiler {
return [[`TRY_LOAD`, value]]
}
case terms.Word: {
return [['PUSH', value]]
}
case terms.DotGet: {
const { objectName, propertyName } = getDotGetParts(node, input)
const instructions: ProgramItem[] = []
instructions.push(['TRY_LOAD', objectName])
instructions.push(['PUSH', propertyName])
instructions.push(['DOT_GET'])
return instructions
}
case terms.BinOp: {
const { left, op, right } = getBinaryParts(node)
const instructions: ProgramItem[] = []

View File

@ -213,7 +213,7 @@ describe('Regex', () => {
})
})
describe.only('native functions', () => {
describe.skip('native functions', () => {
test('print function', () => {
const add = (x: number, y: number) => x + y
expect(`add 5 9`).toEvaluateTo(14, { add })

View File

@ -40,9 +40,9 @@ export const getAssignmentParts = (node: SyntaxNode) => {
const children = getAllChildren(node)
const [left, equals, right] = children
if (!left || left.type.id !== terms.Identifier) {
if (!left || left.type.id !== terms.AssignableIdentifier) {
throw new CompilerError(
`Assign left child must be an Identifier, got ${left ? left.type.name : 'none'}`,
`Assign left child must be an AssignableIdentifier, got ${left ? left.type.name : 'none'}`,
node.from,
node.to
)
@ -70,9 +70,9 @@ export const getFunctionDefParts = (node: SyntaxNode, input: string) => {
}
const paramNames = getAllChildren(paramsNode).map((param) => {
if (param.type.id !== terms.Identifier) {
if (param.type.id !== terms.AssignableIdentifier) {
throw new CompilerError(
`FunctionDef params must be Identifiers, got ${param.type.name}`,
`FunctionDef params must be AssignableIdentifiers, got ${param.type.name}`,
param.from,
param.to
)
@ -198,3 +198,37 @@ export const getStringParts = (node: SyntaxNode, input: string) => {
return { parts, hasInterpolation: parts.length > 0 }
}
export const getDotGetParts = (node: SyntaxNode, input: string) => {
const children = getAllChildren(node)
const [object, property] = children
if (children.length !== 2) {
throw new CompilerError(
`DotGet expected 2 identifier children, got ${children.length}`,
node.from,
node.to
)
}
if (object.type.id !== terms.IdentifierBeforeDot) {
throw new CompilerError(
`DotGet object must be an IdentifierBeforeDot, got ${object.type.name}`,
object.from,
object.to
)
}
if (property.type.id !== terms.Identifier) {
throw new CompilerError(
`DotGet property must be an Identifier, got ${property.type.name}`,
property.from,
property.to
)
}
const objectName = input.slice(object.from, object.to)
const propertyName = input.slice(property.from, property.to)
return { objectName, propertyName }
}

View File

@ -1,42 +1,11 @@
import { ContextTracker } from '@lezer/lr'
import { ContextTracker, InputStream } from '@lezer/lr'
import * as terms from './shrimp.terms'
export class Scope {
constructor(
public parent: Scope | null,
public vars: Set<string>,
public pendingIdentifiers: string[] = [],
public isInParams: boolean = false
) {}
constructor(public parent: Scope | null, public vars = new Set<string>()) {}
has(name: string): boolean {
return this.vars.has(name) ?? this.parent?.has(name)
}
add(...names: string[]): Scope {
const newVars = new Set(this.vars)
names.forEach((name) => newVars.add(name))
return new Scope(this.parent, newVars, [], this.isInParams)
}
push(): Scope {
return new Scope(this, new Set(), [], false)
}
pop(): Scope {
return this.parent ?? new Scope(null, new Set(), [], false)
}
withPendingIdentifiers(ids: string[]): Scope {
return new Scope(this.parent, this.vars, ids, this.isInParams)
}
withIsInParams(value: boolean): Scope {
return new Scope(this.parent, this.vars, this.pendingIdentifiers, value)
}
clearPending(): Scope {
return new Scope(this.parent, this.vars, [], this.isInParams)
return this.vars.has(name) || (this.parent?.has(name) ?? false)
}
hash(): number {
@ -51,76 +20,77 @@ export class Scope {
h = (h << 5) - h + this.parent.hash()
h |= 0
}
// Include pendingIdentifiers and isInParams in hash
h = (h << 5) - h + this.pendingIdentifiers.length
h = (h << 5) - h + (this.isInParams ? 1 : 0)
h |= 0
return h
}
// Static methods that return new Scopes (immutable operations)
static add(scope: Scope, ...names: string[]): Scope {
const newVars = new Set(scope.vars)
names.forEach((name) => newVars.add(name))
return new Scope(scope.parent, newVars)
}
export const trackScope = new ContextTracker<Scope>({
start: new Scope(null, new Set(), [], false),
shift(context, term, stack, input) {
// Track fn keyword to enter param capture mode
if (term === terms.Fn) {
return context.withIsInParams(true).withPendingIdentifiers([])
push(): Scope {
return new Scope(this, new Set())
}
// Capture identifiers
if (term === terms.Identifier) {
// Build text by peeking backwards from stack.pos to input.pos
pop(): Scope {
return this.parent ?? this
}
}
// Tracker context that combines Scope with temporary pending identifiers
class TrackerContext {
constructor(public scope: Scope, public pendingIds: string[] = []) {}
}
// Extract identifier text from input stream
const readIdentifierText = (input: InputStream, start: number, end: number): string => {
let text = ''
const start = input.pos
const end = stack.pos
for (let i = start; i < end; i++) {
const offset = i - input.pos
const ch = input.peek(offset)
if (ch === -1) break
text += String.fromCharCode(ch)
}
// Capture ALL identifiers when in params
if (context.isInParams) {
return context.withPendingIdentifiers([...context.pendingIdentifiers, text])
}
// Capture FIRST identifier for assignments
else if (context.pendingIdentifiers.length === 0) {
return context.withPendingIdentifiers([text])
}
return text
}
return context
export const trackScope = new ContextTracker<TrackerContext>({
start: new TrackerContext(new Scope(null, new Set())),
shift(context, term, stack, input) {
if (term !== terms.AssignableIdentifier) return context
const text = readIdentifierText(input, input.pos, stack.pos)
return new TrackerContext(context.scope, [...context.pendingIds, text])
},
reduce(context, term, stack, input) {
reduce(context, term) {
// Add assignment variable to scope
if (term === terms.Assign && context.pendingIdentifiers.length > 0) {
return context.add(context.pendingIdentifiers[0]!)
if (term === terms.Assign) {
const varName = context.pendingIds.at(-1)
if (!varName) return context
return new TrackerContext(Scope.add(context.scope, varName), context.pendingIds.slice(0, -1))
}
// Push new scope and add parameters
// Push new scope and add all parameters
if (term === terms.Params) {
const newScope = context.push()
if (context.pendingIdentifiers.length > 0) {
return newScope.add(...context.pendingIdentifiers).withIsInParams(false)
let newScope = context.scope.push()
if (context.pendingIds.length > 0) {
newScope = Scope.add(newScope, ...context.pendingIds)
}
return newScope.withIsInParams(false)
return new TrackerContext(newScope, [])
}
// Pop scope when exiting function
if (term === terms.FunctionDef) {
return context.pop()
}
// Clear stale identifiers after non-assignment statements
if (term === terms.DotGet || term === terms.FunctionCallOrIdentifier || term === terms.FunctionCall) {
return context.clearPending()
return new TrackerContext(context.scope.pop(), [])
}
return context
},
hash: (context) => context.hash(),
hash: (context) => context.scope.hash(),
})

View File

@ -43,7 +43,7 @@
}
@external tokens tokenizer from "./tokenizer" { Identifier, Word, IdentifierBeforeDot }
@external tokens tokenizer from "./tokenizer" { Identifier, AssignableIdentifier, Word, IdentifierBeforeDot }
@precedence {
pipe @left,
@ -151,11 +151,11 @@ ConditionalOp {
}
Params {
Identifier*
AssignableIdentifier*
}
Assign {
Identifier "=" consumeToTerminator
AssignableIdentifier "=" consumeToTerminator
}
BinOp {

View File

@ -1,35 +1,36 @@
// This file was generated by lezer-generator. You probably shouldn't edit it.
export const
Identifier = 1,
Word = 2,
IdentifierBeforeDot = 3,
Program = 4,
PipeExpr = 5,
FunctionCall = 6,
PositionalArg = 7,
ParenExpr = 8,
FunctionCallOrIdentifier = 9,
BinOp = 10,
ConditionalOp = 15,
String = 24,
StringFragment = 25,
Interpolation = 26,
EscapeSeq = 27,
Number = 28,
Boolean = 29,
Regex = 30,
Null = 31,
DotGet = 32,
FunctionDef = 33,
Fn = 34,
Params = 35,
colon = 36,
end = 37,
Underscore = 38,
NamedArg = 39,
NamedArgPrefix = 40,
IfExpr = 42,
ThenBlock = 45,
ElsifExpr = 46,
ElseExpr = 48,
Assign = 50
AssignableIdentifier = 2,
Word = 3,
IdentifierBeforeDot = 4,
Program = 5,
PipeExpr = 6,
FunctionCall = 7,
PositionalArg = 8,
ParenExpr = 9,
FunctionCallOrIdentifier = 10,
BinOp = 11,
ConditionalOp = 16,
String = 25,
StringFragment = 26,
Interpolation = 27,
EscapeSeq = 28,
Number = 29,
Boolean = 30,
Regex = 31,
Null = 32,
DotGet = 33,
FunctionDef = 34,
Fn = 35,
Params = 36,
colon = 37,
end = 38,
Underscore = 39,
NamedArg = 40,
NamedArgPrefix = 41,
IfExpr = 43,
ThenBlock = 46,
ElsifExpr = 47,
ElseExpr = 49,
Assign = 51

View File

@ -5,21 +5,21 @@ import {trackScope} from "./scopeTracker"
import {highlighting} from "./highlight"
export const parser = LRParser.deserialize({
version: 14,
states: ".jQVQaOOO#UQbO'#CeO#fQPO'#CfO#tQPO'#DlO$wQaO'#CdO%OOSO'#CtOOQ`'#Dp'#DpO%^OPO'#C|O%cQPO'#DoO%zQaO'#D{OOQ`'#C}'#C}OOQO'#Dm'#DmO&SQPO'#DlO&bQaO'#EPOOQO'#DW'#DWOOQO'#Dl'#DlO&iQPO'#DkOOQ`'#Dk'#DkOOQ`'#Da'#DaQVQaOOOOQ`'#Do'#DoOOQ`'#Cc'#CcO&qQaO'#DTOOQ`'#Dn'#DnOOQ`'#Db'#DbO'OQbO,58|O'oQaO,59zO&bQaO,59QO&bQaO,59QO'|QbO'#CeO)XQPO'#CfO)iQPO,59OO)zQPO,59OO)uQPO,59OO*uQPO,59OO*}QaO'#CvO+VQWO'#CwOOOO'#Dt'#DtOOOO'#Dc'#DcO+kOSO,59`OOQ`,59`,59`O+yO`O,59hOOQ`'#Dd'#DdO,OQaO'#DPO,WQPO,5:gO,]QaO'#DfO,bQPO,58{O,sQPO,5:kO,zQPO,5:kOOQ`,5:V,5:VOOQ`-E7_-E7_OOQ`,59o,59oOOQ`-E7`-E7`OOQO1G/f1G/fOOQO1G.l1G.lO-PQPO1G.lO&bQaO,59VO&bQaO,59VOOQ`1G.j1G.jOOOO,59b,59bOOOO,59c,59cOOOO-E7a-E7aOOQ`1G.z1G.zOOQ`1G/S1G/SOOQ`-E7b-E7bO-kQaO1G0RO-{QbO'#CeOOQO,5:Q,5:QOOQO-E7d-E7dO.lQaO1G0VOOQO1G.q1G.qO.|QPO1G.qO/WQPO7+%mO/]QaO7+%nOOQO'#DY'#DYOOQO7+%q7+%qO/mQaO7+%rOOQ`<<IX<<IXO0TQPO'#DeO0YQaO'#EOO0pQPO<<IYOOQO'#DZ'#DZO0uQPO<<I^OOQ`,5:P,5:POOQ`-E7c-E7cOOQ`AN>tAN>tO&bQaO'#D[OOQO'#Dg'#DgO1QQPOAN>xO1]QPO'#D^OOQOAN>xAN>xO1bQPOAN>xO1gQPO,59vO1nQPO,59vOOQO-E7e-E7eOOQOG24dG24dO1sQPOG24dO1xQPO,59xO1}QPO1G/bOOQOLD*OLD*OO/]QaO1G/dO/mQaO7+$|OOQO7+%O7+%OOOQO<<Hh<<Hh",
stateData: "2Y~O!^OS~OPPOQUORVOlUOmUOnUOoUOrXO{]O!eSO!gTO!qaO~OPdOQUORVOlUOmUOnUOoUOrXOveOxfO!eSO!gTOZ!cX[!cX]!cX^!cXyXX~O`jO!qXX!uXXuXX~P}OZkO[kO]lO^lO~OZkO[kO]lO^lO!q!`X!u!`Xu!`X~OQUORVOlUOmUOnUOoUO!eSO!gTO~OPmO~P$]OiuO!gxO!isO!jtO~O!nyO~OZ!cX[!cX]!cX^!cX!q!`X!u!`Xu!`X~OPzOtsP~Oy}O!q!`X!u!`Xu!`X~OPdO~P$]O!q!RO!u!RO~OPdOrXOv!TO~P$]OPdOrXOveOxfOyUa!qUa!uUa!fUauUa~P$]OPPOrXO{]O~P$]O`!cXa!cXb!cXc!cXd!cXe!cXf!cXg!cX!fXX~P}O`!YOa!YOb!YOc!YOd!YOe!YOf!ZOg!ZO~OZkO[kO]lO^lO~P(mOZkO[kO]lO^lO!f![O~O!f![OZ!cX[!cX]!cX^!cX`!cXa!cXb!cXc!cXd!cXe!cXf!cXg!cX~Oy}O!f![O~OP!]O!eSO~O!g!^O!i!^O!j!^O!k!^O!l!^O!m!^O~OiuO!g!`O!isO!jtO~OP!aO~OPzOtsX~Ot!cO~OP!dO~Oy}O!qTa!uTa!fTauTa~Ot!gO~P(mOt!gO~OZkO[kO]Yi^Yi!qYi!uYi!fYiuYi~OPPOrXO{]O!q!kO~P$]OPdOrXOveOxfOyXX!qXX!uXX!fXXuXX~P$]OPPOrXO{]O!q!nO~P$]O!f_it_i~P(mOu!oO~OPPOrXO{]Ou!rP~P$]OPPOrXO{]Ou!rP!P!rP!R!rP~P$]O!q!uO~OPPOrXO{]Ou!rX!P!rX!R!rX~P$]Ou!wO~Ou!|O!P!xO!R!{O~Ou#RO!P!xO!R!{O~Ot#TO~Ou#RO~Ot#UO~P(mOt#UO~Ou#VO~O!q#WO~O!q#XO~Ol^n[n~",
goto: "+v!uPPPPP!v#V#e#k#V$WPPPP$mPPPPPPPP$yP%c%cPPPP%g&RP&hPPP#ePP&kP&w&z'TP'XP&k'_'e'm's'y(S(ZPPP(a(e(y)])c*_PPP*{PPPPPP+P+PP+b+j+jd_Ocj!c!g!k!n!q#W#XRqSiZOScj}!c!g!k!n!q#W#XXgPim!d|UOPS]cfijklm!Y!Z!c!d!g!k!n!q!x#W#XR!]sdROcj!c!g!k!n!q#W#XQoSQ!WkR!XlQqSQ!Q]Q!h!ZR#P!x}UOPS]cfijklm!Y!Z!c!d!g!k!n!q!x#W#XTuTwdWOcj!c!g!k!n!q#W#XidPS]fiklm!Y!Z!d!xd_Ocj!c!g!k!n!q#W#XWePim!dR!TfR|Xe_Ocj!c!g!k!n!q#W#XR!m!gQ!t!nQ#Y#WR#Z#XT!y!t!zQ!}!tR#S!zQcOR!ScUiPm!dR!UiQwTR!_wQ{XR!b{W!q!k!n#W#XR!v!qS!O[rR!f!OQ!z!tR#Q!zTbOcS`OcQ!VjQ!j!cQ!l!gZ!p!k!n!q#W#Xd[Ocj!c!g!k!n!q#W#XQrSR!e}XhPim!ddQOcj!c!g!k!n!q#W#XWePim!dQnSQ!P]Q!TfQ!WkQ!XlQ!h!YQ!i!ZR#O!xdWOcj!c!g!k!n!q#W#XfdP]fiklm!Y!Z!d!xRpSTvTwoYOPcfijm!c!d!g!k!n!q#W#XQ!r!kV!s!n#W#Xe^Ocj!c!g!k!n!q#W#X",
nodeNames: "⚠ Identifier Word IdentifierBeforeDot Program PipeExpr FunctionCall PositionalArg ParenExpr FunctionCallOrIdentifier BinOp operator operator operator operator ConditionalOp operator operator operator operator operator operator operator operator String StringFragment Interpolation EscapeSeq Number Boolean Regex Null DotGet FunctionDef keyword Params colon end Underscore NamedArg NamedArgPrefix operator IfExpr keyword ThenBlock ThenBlock ElsifExpr keyword ElseExpr keyword Assign",
maxTerm: 83,
states: ".jQVQaOOO#XQbO'#CfO$RQPO'#CgO$aQPO'#DmO$xQaO'#CeO%gOSO'#CuOOQ`'#Dq'#DqO%uOPO'#C}O%zQPO'#DpO&cQaO'#D|OOQ`'#DO'#DOOOQO'#Dn'#DnO&kQPO'#DmO&yQaO'#EQOOQO'#DX'#DXO'hQPO'#DaOOQO'#Dm'#DmO'mQPO'#DlOOQ`'#Dl'#DlOOQ`'#Db'#DbQVQaOOOOQ`'#Dp'#DpOOQ`'#Cd'#CdO'uQaO'#DUOOQ`'#Do'#DoOOQ`'#Dc'#DcO(PQbO,58}O&yQaO,59RO&yQaO,59RO)XQPO'#CgO)iQPO,59PO)zQPO,59PO)uQPO,59PO*uQPO,59PO*}QaO'#CwO+VQWO'#CxOOOO'#Du'#DuOOOO'#Dd'#DdO+kOSO,59aOOQ`,59a,59aO+yO`O,59iOOQ`'#De'#DeO,OQaO'#DQO,WQPO,5:hO,]QaO'#DgO,bQPO,58|O,sQPO,5:lO,zQPO,5:lO-PQaO,59{OOQ`,5:W,5:WOOQ`-E7`-E7`OOQ`,59p,59pOOQ`-E7a-E7aOOQO1G.m1G.mO-^QPO1G.mO&yQaO,59WO&yQaO,59WOOQ`1G.k1G.kOOOO,59c,59cOOOO,59d,59dOOOO-E7b-E7bOOQ`1G.{1G.{OOQ`1G/T1G/TOOQ`-E7c-E7cO-xQaO1G0SO!QQbO'#CfOOQO,5:R,5:ROOQO-E7e-E7eO.YQaO1G0WOOQO1G/g1G/gOOQO1G.r1G.rO.jQPO1G.rO.tQPO7+%nO.yQaO7+%oOOQO'#DZ'#DZOOQO7+%r7+%rO/ZQaO7+%sOOQ`<<IY<<IYO/qQPO'#DfO/vQaO'#EPO0^QPO<<IZOOQO'#D['#D[O0cQPO<<I_OOQ`,5:Q,5:QOOQ`-E7d-E7dOOQ`AN>uAN>uO&yQaO'#D]OOQO'#Dh'#DhO0nQPOAN>yO0yQPO'#D_OOQOAN>yAN>yO1OQPOAN>yO1TQPO,59wO1[QPO,59wOOQO-E7f-E7fOOQOG24eG24eO1aQPOG24eO1fQPO,59yO1kQPO1G/cOOQOLD*PLD*PO.yQaO1G/eO/ZQaO7+$}OOQO7+%P7+%POOQO<<Hi<<Hi",
stateData: "1v~O!_OS~OPPOQ_ORUOSVOmUOnUOoUOpUOsXO|]O!fSO!hTO!rbO~OPeORUOSVOmUOnUOoUOpUOsXOwfOygO!fSO!hTOzYX!rYX!vYX!gYXvYX~O[!dX]!dX^!dX_!dXa!dXb!dXc!dXd!dXe!dXf!dXg!dXh!dX~P!QO[kO]kO^lO_lO~O[kO]kO^lO_lO!r!aX!v!aXv!aX~OPPORUOSVOmUOnUOoUOpUO!fSO!hTO~OjtO!hwO!jrO!ksO~O!oxO~O[!dX]!dX^!dX_!dX!r!aX!v!aXv!aX~OQyOutP~Oz|O!r!aX!v!aXv!aX~OPeORUOSVOmUOnUOoUOpUO!fSO!hTO~Oa!QO~O!r!RO!v!RO~OsXOw!TO~P&yOsXOwfOygOzVa!rVa!vVa!gVavVa~P&yOa!XOb!XOc!XOd!XOe!XOf!XOg!YOh!YO~O[kO]kO^lO_lO~P(mO[kO]kO^lO_lO!g!ZO~O!g!ZO[!dX]!dX^!dX_!dXa!dXb!dXc!dXd!dXe!dXf!dXg!dXh!dX~Oz|O!g!ZO~OP![O!fSO~O!h!]O!j!]O!k!]O!l!]O!m!]O!n!]O~OjtO!h!_O!jrO!ksO~OP!`O~OQyOutX~Ou!bO~OP!cO~Oz|O!rUa!vUa!gUavUa~Ou!fO~P(mOu!fO~OQ_OsXO|]O~P$xO[kO]kO^Zi_Zi!rZi!vZi!gZivZi~OQ_OsXO|]O!r!kO~P$xOQ_OsXO|]O!r!nO~P$xO!g`iu`i~P(mOv!oO~OQ_OsXO|]Ov!sP~P$xOQ_OsXO|]Ov!sP!Q!sP!S!sP~P$xO!r!uO~OQ_OsXO|]Ov!sX!Q!sX!S!sX~P$xOv!wO~Ov!|O!Q!xO!S!{O~Ov#RO!Q!xO!S!{O~Ou#TO~Ov#RO~Ou#UO~P(mOu#UO~Ov#VO~O!r#WO~O!r#XO~Om_o]o~",
goto: "+m!vPPPPPP!w#W#f#k#W$VPPPP$lPPPPPPPP$xP%a%aPPPP%e&OP&dPPP#fPP&gP&s&v'PP'TP&g'Z'a'h'n't'}(UPPP([(`(t)W)]*WPPP*sPPPPPP*w*wP+X+a+ad`Od!Q!b!f!k!n!q#W#XRpSiZOSd|!Q!b!f!k!n!q#W#XVhPj!czUOPS]dgjkl!Q!X!Y!b!c!f!k!n!q!x#W#XR![rdROd!Q!b!f!k!n!q#W#XQnSQ!VkR!WlQpSQ!P]Q!h!YR#P!x{UOPS]dgjkl!Q!X!Y!b!c!f!k!n!q!x#W#XTtTvdWOd!Q!b!f!k!n!q#W#XgePS]gjkl!X!Y!c!xd`Od!Q!b!f!k!n!q#W#XUfPj!cR!TgR{Xe`Od!Q!b!f!k!n!q#W#XR!m!fQ!t!nQ#Y#WR#Z#XT!y!t!zQ!}!tR#S!zQdOR!SdSjP!cR!UjQvTR!^vQzXR!azW!q!k!n#W#XR!v!qS}[qR!e}Q!z!tR#Q!zTcOdSaOdQ!g!QQ!j!bQ!l!fZ!p!k!n!q#W#Xd[Od!Q!b!f!k!n!q#W#XQqSR!d|ViPj!cdQOd!Q!b!f!k!n!q#W#XUfPj!cQmSQ!O]Q!TgQ!VkQ!WlQ!h!XQ!i!YR#O!xdWOd!Q!b!f!k!n!q#W#XdeP]gjkl!X!Y!c!xRoSTuTvmYOPdgj!Q!b!c!f!k!n!q#W#XQ!r!kV!s!n#W#Xe^Od!Q!b!f!k!n!q#W#X",
nodeNames: "⚠ Identifier AssignableIdentifier Word IdentifierBeforeDot Program PipeExpr FunctionCall PositionalArg ParenExpr FunctionCallOrIdentifier BinOp operator operator operator operator ConditionalOp operator operator operator operator operator operator operator operator String StringFragment Interpolation EscapeSeq Number Boolean Regex Null DotGet FunctionDef keyword Params colon end Underscore NamedArg NamedArgPrefix operator IfExpr keyword ThenBlock ThenBlock ElsifExpr keyword ElseExpr keyword Assign",
maxTerm: 84,
context: trackScope,
nodeProps: [
["closedBy", 36,"end"],
["openedBy", 37,"colon"]
["closedBy", 37,"end"],
["openedBy", 38,"colon"]
],
propSources: [highlighting],
skippedNodes: [0],
repeatNodeCount: 7,
tokenData: "!&X~R!SOX$_XY$|YZ%gZp$_pq$|qr&Qrt$_tu'Yuw$_wx'_xy'dyz'}z{(h{|)R|}$_}!O)l!O!P,b!P!Q,{!Q![*]![!]5j!]!^%g!^!_6T!_!`7_!`!a7x!a#O$_#O#P9S#P#R$_#R#S9X#S#T$_#T#U9r#U#X;W#X#Y=m#Y#ZDs#Z#];W#]#^JO#^#b;W#b#cKp#c#d! Y#d#f;W#f#g!!z#g#h;W#h#i!#q#i#o;W#o#p$_#p#q!%i#q;'S$_;'S;=`$v<%l~$_~O$_~~!&SS$dUiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_S$yP;=`<%l$__%TUiS!^ZOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V%nUiS!qROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V&VWiSOt$_uw$_x!_$_!_!`&o!`#O$_#P;'S$_;'S;=`$v<%lO$_V&vUaRiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~'_O!i~~'dO!g~V'kUiS!eROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(UUiS!fROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(oUZRiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)YU]RiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)sWiS^ROt$_uw$_x!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V*dYiSlROt$_uw$_x!O$_!O!P+S!P!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V+XWiSOt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_V+xWiSlROt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_T,iU!nPiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V-SWiS[ROt$_uw$_x!P$_!P!Q-l!Q#O$_#P;'S$_;'S;=`$v<%lO$_V-q^iSOY.mYZ$_Zt.mtu/puw.mwx/px!P.m!P!Q$_!Q!}.m!}#O4c#O#P2O#P;'S.m;'S;=`5d<%lO.mV.t^iSnROY.mYZ$_Zt.mtu/puw.mwx/px!P.m!P!Q2e!Q!}.m!}#O4c#O#P2O#P;'S.m;'S;=`5d<%lO.mR/uXnROY/pZ!P/p!P!Q0b!Q!}/p!}#O1P#O#P2O#P;'S/p;'S;=`2_<%lO/pR0eP!P!Q0hR0mUnR#Z#[0h#]#^0h#a#b0h#g#h0h#i#j0h#m#n0hR1SVOY1PZ#O1P#O#P1i#P#Q/p#Q;'S1P;'S;=`1x<%lO1PR1lSOY1PZ;'S1P;'S;=`1x<%lO1PR1{P;=`<%l1PR2RSOY/pZ;'S/p;'S;=`2_<%lO/pR2bP;=`<%l/pV2jWiSOt$_uw$_x!P$_!P!Q3S!Q#O$_#P;'S$_;'S;=`$v<%lO$_V3ZbiSnROt$_uw$_x#O$_#P#Z$_#Z#[3S#[#]$_#]#^3S#^#a$_#a#b3S#b#g$_#g#h3S#h#i$_#i#j3S#j#m$_#m#n3S#n;'S$_;'S;=`$v<%lO$_V4h[iSOY4cYZ$_Zt4ctu1Puw4cwx1Px#O4c#O#P1i#P#Q.m#Q;'S4c;'S;=`5^<%lO4cV5aP;=`<%l4cV5gP;=`<%l.mT5qUiStPOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V6[WbRiSOt$_uw$_x!_$_!_!`6t!`#O$_#P;'S$_;'S;=`$v<%lO$_V6{UcRiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V7fU`RiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V8PWdRiSOt$_uw$_x!_$_!_!`8i!`#O$_#P;'S$_;'S;=`$v<%lO$_V8pUeRiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~9XO!j~V9`UiSvROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V9w[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#b;W#b#c;{#c#o;W#o;'S$_;'S;=`$v<%lO$_U:tUxQiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_U;]YiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V<Q[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#W;W#W#X<v#X#o;W#o;'S$_;'S;=`$v<%lO$_V<}YfRiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V=r^iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#a>n#a#b;W#b#cCR#c#o;W#o;'S$_;'S;=`$v<%lO$_V>s[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#g;W#g#h?i#h#o;W#o;'S$_;'S;=`$v<%lO$_V?n^iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#X;W#X#Y@j#Y#];W#]#^Aa#^#o;W#o;'S$_;'S;=`$v<%lO$_V@qY!RPiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VAf[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#Y;W#Y#ZB[#Z#o;W#o;'S$_;'S;=`$v<%lO$_VBcY!PPiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VCW[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#W;W#W#XC|#X#o;W#o;'S$_;'S;=`$v<%lO$_VDTYiSuROt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VDx]iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#UEq#U#b;W#b#cIX#c#o;W#o;'S$_;'S;=`$v<%lO$_VEv[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aFl#a#o;W#o;'S$_;'S;=`$v<%lO$_VFq[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#g;W#g#hGg#h#o;W#o;'S$_;'S;=`$v<%lO$_VGl[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#X;W#X#YHb#Y#o;W#o;'S$_;'S;=`$v<%lO$_VHiYmRiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VI`YrRiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VJT[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#Y;W#Y#ZJy#Z#o;W#o;'S$_;'S;=`$v<%lO$_VKQY{PiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$__Kw[!kWiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#i;W#i#jLm#j#o;W#o;'S$_;'S;=`$v<%lO$_VLr[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aMh#a#o;W#o;'S$_;'S;=`$v<%lO$_VMm[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aNc#a#o;W#o;'S$_;'S;=`$v<%lO$_VNjYoRiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V! _[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#f;W#f#g!!T#g#o;W#o;'S$_;'S;=`$v<%lO$_V!![YgRiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_^!#RY!mWiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$__!#x[!lWiSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#f;W#f#g!$n#g#o;W#o;'S$_;'S;=`$v<%lO$_V!$s[iSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#i;W#i#jGg#j#o;W#o;'S$_;'S;=`$v<%lO$_V!%pUyRiSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~!&XO!u~",
tokenData: "!&X~R!SOX$_XY$|YZ%gZp$_pq$|qr&Qrt$_tu'Yuw$_wx'_xy'dyz'}z{(h{|)R|}$_}!O)l!O!P,b!P!Q,{!Q![*]![!]5j!]!^%g!^!_6T!_!`7_!`!a7x!a#O$_#O#P9S#P#R$_#R#S9X#S#T$_#T#U9r#U#X;W#X#Y=m#Y#ZDs#Z#];W#]#^JO#^#b;W#b#cKp#c#d! Y#d#f;W#f#g!!z#g#h;W#h#i!#q#i#o;W#o#p$_#p#q!%i#q;'S$_;'S;=`$v<%l~$_~O$_~~!&SS$dUjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_S$yP;=`<%l$__%TUjS!_ZOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V%nUjS!rROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V&VWjSOt$_uw$_x!_$_!_!`&o!`#O$_#P;'S$_;'S;=`$v<%lO$_V&vUbRjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~'_O!j~~'dO!h~V'kUjS!fROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(UUjS!gROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(oU[RjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)YU^RjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)sWjS_ROt$_uw$_x!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V*dYjSmROt$_uw$_x!O$_!O!P+S!P!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V+XWjSOt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_V+xWjSmROt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_T,iU!oPjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V-SWjS]ROt$_uw$_x!P$_!P!Q-l!Q#O$_#P;'S$_;'S;=`$v<%lO$_V-q^jSOY.mYZ$_Zt.mtu/puw.mwx/px!P.m!P!Q$_!Q!}.m!}#O4c#O#P2O#P;'S.m;'S;=`5d<%lO.mV.t^jSoROY.mYZ$_Zt.mtu/puw.mwx/px!P.m!P!Q2e!Q!}.m!}#O4c#O#P2O#P;'S.m;'S;=`5d<%lO.mR/uXoROY/pZ!P/p!P!Q0b!Q!}/p!}#O1P#O#P2O#P;'S/p;'S;=`2_<%lO/pR0eP!P!Q0hR0mUoR#Z#[0h#]#^0h#a#b0h#g#h0h#i#j0h#m#n0hR1SVOY1PZ#O1P#O#P1i#P#Q/p#Q;'S1P;'S;=`1x<%lO1PR1lSOY1PZ;'S1P;'S;=`1x<%lO1PR1{P;=`<%l1PR2RSOY/pZ;'S/p;'S;=`2_<%lO/pR2bP;=`<%l/pV2jWjSOt$_uw$_x!P$_!P!Q3S!Q#O$_#P;'S$_;'S;=`$v<%lO$_V3ZbjSoROt$_uw$_x#O$_#P#Z$_#Z#[3S#[#]$_#]#^3S#^#a$_#a#b3S#b#g$_#g#h3S#h#i$_#i#j3S#j#m$_#m#n3S#n;'S$_;'S;=`$v<%lO$_V4h[jSOY4cYZ$_Zt4ctu1Puw4cwx1Px#O4c#O#P1i#P#Q.m#Q;'S4c;'S;=`5^<%lO4cV5aP;=`<%l4cV5gP;=`<%l.mT5qUjSuPOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V6[WcRjSOt$_uw$_x!_$_!_!`6t!`#O$_#P;'S$_;'S;=`$v<%lO$_V6{UdRjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V7fUaRjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V8PWeRjSOt$_uw$_x!_$_!_!`8i!`#O$_#P;'S$_;'S;=`$v<%lO$_V8pUfRjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~9XO!k~V9`UjSwROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V9w[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#b;W#b#c;{#c#o;W#o;'S$_;'S;=`$v<%lO$_U:tUyQjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_U;]YjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V<Q[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#W;W#W#X<v#X#o;W#o;'S$_;'S;=`$v<%lO$_V<}YgRjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V=r^jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#a>n#a#b;W#b#cCR#c#o;W#o;'S$_;'S;=`$v<%lO$_V>s[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#g;W#g#h?i#h#o;W#o;'S$_;'S;=`$v<%lO$_V?n^jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#X;W#X#Y@j#Y#];W#]#^Aa#^#o;W#o;'S$_;'S;=`$v<%lO$_V@qY!SPjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VAf[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#Y;W#Y#ZB[#Z#o;W#o;'S$_;'S;=`$v<%lO$_VBcY!QPjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VCW[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#W;W#W#XC|#X#o;W#o;'S$_;'S;=`$v<%lO$_VDTYjSvROt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VDx]jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#UEq#U#b;W#b#cIX#c#o;W#o;'S$_;'S;=`$v<%lO$_VEv[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aFl#a#o;W#o;'S$_;'S;=`$v<%lO$_VFq[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#g;W#g#hGg#h#o;W#o;'S$_;'S;=`$v<%lO$_VGl[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#X;W#X#YHb#Y#o;W#o;'S$_;'S;=`$v<%lO$_VHiYnRjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VI`YsRjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_VJT[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#Y;W#Y#ZJy#Z#o;W#o;'S$_;'S;=`$v<%lO$_VKQY|PjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$__Kw[!lWjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#i;W#i#jLm#j#o;W#o;'S$_;'S;=`$v<%lO$_VLr[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aMh#a#o;W#o;'S$_;'S;=`$v<%lO$_VMm[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#`;W#`#aNc#a#o;W#o;'S$_;'S;=`$v<%lO$_VNjYpRjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_V! _[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#f;W#f#g!!T#g#o;W#o;'S$_;'S;=`$v<%lO$_V!![YhRjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$_^!#RY!nWjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#o;W#o;'S$_;'S;=`$v<%lO$__!#x[!mWjSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#f;W#f#g!$n#g#o;W#o;'S$_;'S;=`$v<%lO$_V!$s[jSOt$_uw$_x!_$_!_!`:m!`#O$_#P#T$_#T#i;W#i#jGg#j#o;W#o;'S$_;'S;=`$v<%lO$_V!%pUzRjSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~!&XO!v~",
tokenizers: [0, 1, 2, 3, tokenizer],
topRules: {"Program":[0,4]},
tokenPrec: 786
topRules: {"Program":[0,5]},
tokenPrec: 768
})

View File

@ -10,7 +10,7 @@ describe('null', () => {
test('parses null in assignments', () => {
expect('a = null').toMatchTree(`
Assign
Identifier a
AssignableIdentifier a
operator =
Null null`)
})
@ -212,11 +212,11 @@ describe('newlines', () => {
expect(`x = 5
y = 2`).toMatchTree(`
Assign
Identifier x
AssignableIdentifier x
operator =
Number 5
Assign
Identifier y
AssignableIdentifier y
operator =
Number 2`)
})
@ -224,11 +224,11 @@ y = 2`).toMatchTree(`
test('parses statements separated by semicolons', () => {
expect(`x = 5; y = 2`).toMatchTree(`
Assign
Identifier x
AssignableIdentifier x
operator =
Number 5
Assign
Identifier y
AssignableIdentifier y
operator =
Number 2`)
})
@ -236,7 +236,7 @@ y = 2`).toMatchTree(`
test('parses statement with word and a semicolon', () => {
expect(`a = hello; 2`).toMatchTree(`
Assign
Identifier a
AssignableIdentifier a
operator =
FunctionCallOrIdentifier
Identifier hello
@ -248,7 +248,7 @@ describe('Assign', () => {
test('parses simple assignment', () => {
expect('x = 5').toMatchTree(`
Assign
Identifier x
AssignableIdentifier x
operator =
Number 5`)
})
@ -256,7 +256,7 @@ describe('Assign', () => {
test('parses assignment with addition', () => {
expect('x = 5 + 3').toMatchTree(`
Assign
Identifier x
AssignableIdentifier x
operator =
BinOp
Number 5
@ -267,13 +267,13 @@ describe('Assign', () => {
test('parses assignment with functions', () => {
expect('add = fn a b: a + b end').toMatchTree(`
Assign
Identifier add
AssignableIdentifier add
operator =
FunctionDef
keyword fn
Params
Identifier a
Identifier b
AssignableIdentifier a
AssignableIdentifier b
colon :
BinOp
Identifier a
@ -287,7 +287,7 @@ describe('DotGet whitespace sensitivity', () => {
test('no whitespace - DotGet works when identifier in scope', () => {
expect('basename = 5; basename.prop').toMatchTree(`
Assign
Identifier basename
AssignableIdentifier basename
operator =
Number 5
DotGet
@ -298,7 +298,7 @@ describe('DotGet whitespace sensitivity', () => {
test('space before dot - NOT DotGet, parses as division', () => {
expect('basename = 5; basename / prop').toMatchTree(`
Assign
Identifier basename
AssignableIdentifier basename
operator =
Number 5
BinOp

View File

@ -19,7 +19,7 @@ describe('if/elsif/else', () => {
expect('a = if x: 2').toMatchTree(`
Assign
Identifier a
AssignableIdentifier a
operator =
IfExpr
keyword if

View File

@ -17,7 +17,7 @@ describe('DotGet', () => {
test('obj.prop is DotGet when obj is assigned', () => {
expect('obj = 5; obj.prop').toMatchTree(`
Assign
Identifier obj
AssignableIdentifier obj
operator =
Number 5
DotGet
@ -31,7 +31,7 @@ describe('DotGet', () => {
FunctionDef
keyword fn
Params
Identifier config
AssignableIdentifier config
colon :
DotGet
IdentifierBeforeDot config
@ -45,7 +45,7 @@ describe('DotGet', () => {
FunctionDef
keyword fn
Params
Identifier x
AssignableIdentifier x
colon :
DotGet
IdentifierBeforeDot x
@ -63,8 +63,8 @@ end`).toMatchTree(`
FunctionDef
keyword fn
Params
Identifier x
Identifier y
AssignableIdentifier x
AssignableIdentifier y
colon :
DotGet
IdentifierBeforeDot x
@ -84,7 +84,7 @@ end`).toMatchTree(`
FunctionDef
keyword fn
Params
Identifier x
AssignableIdentifier x
colon :
DotGet
IdentifierBeforeDot x
@ -92,7 +92,7 @@ end`).toMatchTree(`
FunctionDef
keyword fn
Params
Identifier y
AssignableIdentifier y
colon :
DotGet
IdentifierBeforeDot y
@ -105,7 +105,7 @@ end`).toMatchTree(`
test('dot get works as function argument', () => {
expect('config = 42; echo config.path').toMatchTree(`
Assign
Identifier config
AssignableIdentifier config
operator =
Number 42
FunctionCall
@ -120,7 +120,7 @@ end`).toMatchTree(`
test('mixed file paths and dot get', () => {
expect('config = 42; cat readme.txt; echo config.path').toMatchTree(`
Assign
Identifier config
AssignableIdentifier config
operator =
Number 42
FunctionCall

View File

@ -72,7 +72,7 @@ describe('Fn', () => {
FunctionDef
keyword fn
Params
Identifier x
AssignableIdentifier x
colon :
BinOp
Identifier x
@ -86,8 +86,8 @@ describe('Fn', () => {
FunctionDef
keyword fn
Params
Identifier x
Identifier y
AssignableIdentifier x
AssignableIdentifier y
colon :
BinOp
Identifier x
@ -104,8 +104,8 @@ end`).toMatchTree(`
FunctionDef
keyword fn
Params
Identifier x
Identifier y
AssignableIdentifier x
AssignableIdentifier y
colon :
BinOp
Identifier x

View File

@ -21,16 +21,16 @@ describe('multiline', () => {
add 3 4
`).toMatchTree(`
Assign
Identifier add
AssignableIdentifier add
operator =
FunctionDef
keyword fn
Params
Identifier a
Identifier b
AssignableIdentifier a
AssignableIdentifier b
colon :
Assign
Identifier result
AssignableIdentifier result
operator =
BinOp
Identifier a
@ -63,8 +63,8 @@ end
FunctionDef
keyword fn
Params
Identifier x
Identifier y
AssignableIdentifier x
AssignableIdentifier y
colon :
FunctionCallOrIdentifier
Identifier x

View File

@ -50,7 +50,7 @@ describe('pipe expressions', () => {
test('pipe expression in assignment', () => {
expect('result = echo hello | grep h').toMatchTree(`
Assign
Identifier result
AssignableIdentifier result
operator =
PipeExpr
FunctionCall
@ -77,7 +77,7 @@ describe('pipe expressions', () => {
FunctionDef
keyword fn
Params
Identifier x
AssignableIdentifier x
colon :
FunctionCallOrIdentifier
Identifier x

View File

@ -1,63 +1,107 @@
import { ExternalTokenizer, InputStream, Stack } from '@lezer/lr'
import { Identifier, Word, IdentifierBeforeDot } from './shrimp.terms'
import type { Scope } from './scopeTracker'
import { Identifier, AssignableIdentifier, Word, IdentifierBeforeDot } from './shrimp.terms'
// The only chars that can't be words are whitespace, apostrophes, closing parens, and EOF.
export const tokenizer = new ExternalTokenizer(
(input: InputStream, stack: Stack) => {
let ch = getFullCodePoint(input, 0)
console.log(`🌭 checking char ${String.fromCodePoint(ch)}`)
const ch = getFullCodePoint(input, 0)
if (!isWordChar(ch)) return
let pos = getCharSize(ch)
let isValidIdentifier = isLowercaseLetter(ch) || isEmoji(ch)
const isValidStart = isLowercaseLetter(ch) || isEmoji(ch)
const canBeWord = stack.canShift(Word)
while (true) {
ch = getFullCodePoint(input, pos)
// Consume all word characters, tracking if it remains a valid identifier
const { pos, isValidIdentifier, stoppedAtDot } = consumeWordToken(
input,
isValidStart,
canBeWord
)
// Check for dot and scope - property access detection
if (ch === 46 /* . */ && isValidIdentifier) {
// Build identifier text by peeking character by character
let identifierText = ''
for (let i = 0; i < pos; i++) {
// Check if we should emit IdentifierBeforeDot for property access
if (stoppedAtDot) {
const dotGetToken = checkForDotGet(input, stack, pos)
if (dotGetToken) {
input.advance(pos)
input.acceptToken(dotGetToken)
} else {
// Not in scope - continue consuming the dot as part of the word
const afterDot = consumeRestOfWord(input, pos + 1, canBeWord)
input.advance(afterDot)
input.acceptToken(Word)
}
return
}
// Advance past the token we consumed
input.advance(pos)
// Choose which token to emit
if (isValidIdentifier) {
const token = chooseIdentifierToken(input, stack)
input.acceptToken(token)
} else {
input.acceptToken(Word)
}
},
{ contextual: true }
)
// Build identifier text from input stream, handling surrogate pairs for emoji
const buildIdentifierText = (input: InputStream, length: number): string => {
let text = ''
for (let i = 0; i < length; i++) {
const charCode = input.peek(i)
if (charCode === -1) break
// Handle surrogate pairs for emoji
if (charCode >= 0xd800 && charCode <= 0xdbff && i + 1 < pos) {
// Handle surrogate pairs for emoji (UTF-16 encoding)
if (charCode >= 0xd800 && charCode <= 0xdbff && i + 1 < length) {
const low = input.peek(i + 1)
if (low >= 0xdc00 && low <= 0xdfff) {
identifierText += String.fromCharCode(charCode, low)
text += String.fromCharCode(charCode, low)
i++ // Skip the low surrogate
continue
}
}
identifierText += String.fromCharCode(charCode)
text += String.fromCharCode(charCode)
}
return text
}
const scope = stack.context as Scope | undefined
// Consume word characters, tracking if it remains a valid identifier
// Returns the position after consuming, whether it's a valid identifier, and if we stopped at a dot
const consumeWordToken = (
input: InputStream,
isValidStart: boolean,
canBeWord: boolean
): { pos: number; isValidIdentifier: boolean; stoppedAtDot: boolean } => {
let pos = getCharSize(getFullCodePoint(input, 0))
let isValidIdentifier = isValidStart
let stoppedAtDot = false
if (scope?.has(identifierText)) {
// In scope - stop here, let grammar parse property access
input.advance(pos)
input.acceptToken(IdentifierBeforeDot)
return
}
// Not in scope - continue consuming as Word (fall through)
while (true) {
const ch = getFullCodePoint(input, pos)
// Stop at dot if we have a valid identifier (might be property access)
if (ch === 46 /* . */ && isValidIdentifier) {
stoppedAtDot = true
break
}
// Stop if we hit a non-word character
if (!isWordChar(ch)) break
// Certain characters might end a word or identifier if they are followed by whitespace.
// This allows things like `a = hello; 2` of if `x: y` to parse correctly.
// Context-aware termination: semicolon/colon can end a word if followed by whitespace
// This allows `hello; 2` to parse correctly while `hello;world` stays as one word
if (canBeWord && (ch === 59 /* ; */ || ch === 58) /* : */) {
const nextCh = getFullCodePoint(input, pos + 1)
if (!isWordChar(nextCh)) break
}
// Track identifier validity
if (!isLowercaseLetter(ch) && !isDigit(ch) && ch !== 45 && !isEmoji(ch)) {
// Track identifier validity: must be lowercase, digit, dash, or emoji
if (!isLowercaseLetter(ch) && !isDigit(ch) && ch !== 45 /* - */ && !isEmoji(ch)) {
if (!canBeWord) break
isValidIdentifier = false
}
@ -65,21 +109,73 @@ export const tokenizer = new ExternalTokenizer(
pos += getCharSize(ch)
}
input.advance(pos)
input.acceptToken(isValidIdentifier ? Identifier : Word)
},
{ contextual: true }
)
return { pos, isValidIdentifier, stoppedAtDot }
}
// Consume the rest of a word after we've decided not to treat a dot as DotGet
// Used when we have "file.txt" - we already consumed "file", now consume ".txt"
const consumeRestOfWord = (input: InputStream, startPos: number, canBeWord: boolean): number => {
let pos = startPos
while (true) {
const ch = getFullCodePoint(input, pos)
// Stop if we hit a non-word character
if (!isWordChar(ch)) break
// Context-aware termination for semicolon/colon
if (canBeWord && (ch === 59 /* ; */ || ch === 58) /* : */) {
const nextCh = getFullCodePoint(input, pos + 1)
if (!isWordChar(nextCh)) break
}
pos += getCharSize(ch)
}
return pos
}
// Check if this identifier is in scope (for property access detection)
// Returns IdentifierBeforeDot token if in scope, null otherwise
const checkForDotGet = (input: InputStream, stack: Stack, pos: number): number | null => {
const identifierText = buildIdentifierText(input, pos)
const context = stack.context as { scope: { has(name: string): boolean } } | undefined
// If identifier is in scope, this is property access (e.g., obj.prop)
// If not in scope, it should be consumed as a Word (e.g., file.txt)
return context?.scope.has(identifierText) ? IdentifierBeforeDot : null
}
// Decide between AssignableIdentifier and Identifier using grammar state + peek-ahead
const chooseIdentifierToken = (input: InputStream, stack: Stack): number => {
const canAssignable = stack.canShift(AssignableIdentifier)
const canRegular = stack.canShift(Identifier)
// Only one option is valid - use it
if (canAssignable && !canRegular) return AssignableIdentifier
if (canRegular && !canAssignable) return Identifier
// Both possible (ambiguous context) - peek ahead for '=' to disambiguate
// This happens at statement start where both `x = 5` (assign) and `echo x` (call) are valid
let peekPos = 0
while (true) {
const ch = getFullCodePoint(input, peekPos)
if (isWhiteSpace(ch)) {
peekPos += getCharSize(ch)
} else {
break
}
}
const nextCh = getFullCodePoint(input, peekPos)
return nextCh === 61 /* = */ ? AssignableIdentifier : Identifier
}
// Character classification helpers
const isWhiteSpace = (ch: number): boolean => {
return ch === 32 /* space */ || ch === 10 /* \n */ || ch === 9 /* tab */ || ch === 13 /* \r */
return ch === 32 /* space */ || ch === 9 /* tab */ || ch === 13 /* \r */
}
const isWordChar = (ch: number): boolean => {
const closingParen = ch === 41 /* ) */
const eof = ch === -1
return !isWhiteSpace(ch) && !closingParen && !eof
return !isWhiteSpace(ch) && ch !== 10 /* \n */ && ch !== 41 /* ) */ && ch !== -1 /* EOF */
}
const isLowercaseLetter = (ch: number): boolean => {
@ -103,7 +199,7 @@ const getFullCodePoint = (input: InputStream, pos: number): number => {
}
}
return ch // Single code unit
return ch
}
const isEmoji = (ch: number): boolean => {