shrimp/README.md
2025-10-03 08:15:02 -07:00

90 lines
3.1 KiB
Markdown

# Shrimp Parser - Development Context
## Overview
Building a command-line language parser using Lezer (CodeMirror's parser system) with TypeScript. The goal is to create a prototype that can parse commands with arguments, similar to shell syntax, with inline hints for autocompletion.
## Current Architecture
### Grammar Structure (`shrimp.grammar`)
- **Commands**: Can be complete (`Command`) or partial (`CommandPartial`) for autocomplete
- **Arguments**: Positional or named (with `name=value` syntax)
- **Key Challenge**: Handling arbitrary text (like file paths) as arguments without conflicting with operators/keywords
### Tokenizer Setup (`tokenizers.ts`)
- **Main tokenizer**: Returns `Command`, `CommandPartial`, or `Identifier` based on context
- **Command matching**: Uses `matchCommand()` to check against available commands
- **Context-aware**: Uses `stack.canShift()` to return appropriate token based on parse position
- **Issue**: Second occurrence of command name (e.g., `tail tail`) should be `Identifier` not `Command`
### Key Design Decisions
1. **External tokenizers over regular tokens** for commands to enable:
- Dynamic command list (can change at runtime)
- Partial matching for autocomplete
- Context-aware tokenization
2. **Virtual semicolons** for statement boundaries:
- Using `insertSemicolon` external tokenizer
- Inserts at newlines/EOF to keep parser "inside" CommandCall
- Prevents `tail t` from parsing as two separate commands
3. **UnquotedArg token** for paths/arbitrary text:
- Accepts anything except whitespace/parens/equals
- Only valid in command argument context
- Avoids conflicts with operators elsewhere
### Current Problems
1. **Parser completes CommandCall too early**
- After `tail `, cursor shows position in `Program` not `CommandCall`
- Makes hint system harder to implement
2. **Command token in wrong context**
- `tail tail` - second "tail" returns `Command` token but should be `Identifier`
- Need better context checking in tokenizer
3. **Inline hints need to be smarter**
- Must look backward to find command context
- Handle cases where parser has "completed" the command
### Test Infrastructure
- Custom test matchers: `toMatchTree`, `toEvaluateTo`
- Command source injection for testing: `setCommandSource()`
- Tests in `shrimp.test.ts`
### File Structure
```
src/parser/
shrimp.grammar - Lezer grammar definition
tokenizers.ts - External tokenizers
shrimp.ts - Generated parser
src/editor/
commands.ts - Command definitions
plugins/
inlineHints.tsx - Autocomplete hint UI
```
## Next Steps
1. Fix tokenizer context checking with `stack.canShift()`
2. Improve hint detection for "after command with space" case
3. Consider if grammar structure changes would help
## Key Concepts to Remember
- Lezer is LR parser - builds tree bottom-up
- External tokenizers run at each position
- `@skip { space }` makes whitespace invisible to parser
- Token precedence matters for overlap resolution
- `stack.canShift(tokenId)` checks if token is valid at current position