3.1 KiB
3.1 KiB
Shrimp Parser - Development Context
Overview
Building a command-line language parser using Lezer (CodeMirror's parser system) with TypeScript. The goal is to create a prototype that can parse commands with arguments, similar to shell syntax, with inline hints for autocompletion.
Current Architecture
Grammar Structure (shrimp.grammar)
- Commands: Can be complete (
Command) or partial (CommandPartial) for autocomplete - Arguments: Positional or named (with
name=valuesyntax) - Key Challenge: Handling arbitrary text (like file paths) as arguments without conflicting with operators/keywords
Tokenizer Setup (tokenizers.ts)
- Main tokenizer: Returns
Command,CommandPartial, orIdentifierbased on context - Command matching: Uses
matchCommand()to check against available commands - Context-aware: Uses
stack.canShift()to return appropriate token based on parse position - Issue: Second occurrence of command name (e.g.,
tail tail) should beIdentifiernotCommand
Key Design Decisions
-
External tokenizers over regular tokens for commands to enable:
- Dynamic command list (can change at runtime)
- Partial matching for autocomplete
- Context-aware tokenization
-
Virtual semicolons for statement boundaries:
- Using
insertSemicolonexternal tokenizer - Inserts at newlines/EOF to keep parser "inside" CommandCall
- Prevents
tail tfrom parsing as two separate commands
- Using
-
UnquotedArg token for paths/arbitrary text:
- Accepts anything except whitespace/parens/equals
- Only valid in command argument context
- Avoids conflicts with operators elsewhere
Current Problems
-
Parser completes CommandCall too early
- After
tail, cursor shows position inProgramnotCommandCall - Makes hint system harder to implement
- After
-
Command token in wrong context
tail tail- second "tail" returnsCommandtoken but should beIdentifier- Need better context checking in tokenizer
-
Inline hints need to be smarter
- Must look backward to find command context
- Handle cases where parser has "completed" the command
Test Infrastructure
- Custom test matchers:
toMatchTree,toEvaluateTo - Command source injection for testing:
setCommandSource() - Tests in
shrimp.test.ts
File Structure
src/parser/
shrimp.grammar - Lezer grammar definition
tokenizers.ts - External tokenizers
shrimp.ts - Generated parser
src/editor/
commands.ts - Command definitions
plugins/
inlineHints.tsx - Autocomplete hint UI
Next Steps
- Fix tokenizer context checking with
stack.canShift() - Improve hint detection for "after command with space" case
- Consider if grammar structure changes would help
Key Concepts to Remember
- Lezer is LR parser - builds tree bottom-up
- External tokenizers run at each position
@skip { space }makes whitespace invisible to parser- Token precedence matters for overlap resolution
stack.canShift(tokenId)checks if token is valid at current position