Go to file

Corey Johnson eff09931ad wip		2025-10-03 14:34:02 -07:00
src	wip	2025-10-03 14:34:02 -07:00
.gitattributes	Initial commit	2025-09-25 20:17:27 -07:00
.gitignore	Initial commit	2025-09-25 20:17:27 -07:00
build.ts	Initial commit	2025-09-25 20:17:27 -07:00
bun-env.d.ts	Initial commit	2025-09-25 20:17:27 -07:00
bun.lock	editor	2025-09-26 16:16:09 -07:00
bunfig.toml	emoji	2025-09-29 11:40:32 -07:00
CLAUDE.md	Initial commit	2025-09-25 20:17:27 -07:00
package.json	emoji	2025-09-29 11:40:32 -07:00
README.md	wip	2025-10-03 08:15:02 -07:00
tsconfig.json	OMG hairy	2025-10-02 14:05:17 -07:00

README.md

Shrimp Parser - Development Context

Overview

Building a command-line language parser using Lezer (CodeMirror's parser system) with TypeScript. The goal is to create a prototype that can parse commands with arguments, similar to shell syntax, with inline hints for autocompletion.

Current Architecture

Grammar Structure (`shrimp.grammar`)

Commands: Can be complete (Command) or partial (CommandPartial) for autocomplete
Arguments: Positional or named (with name=value syntax)
Key Challenge: Handling arbitrary text (like file paths) as arguments without conflicting with operators/keywords

Tokenizer Setup (`tokenizers.ts`)

Main tokenizer: Returns Command, CommandPartial, or Identifier based on context
Command matching: Uses matchCommand() to check against available commands
Context-aware: Uses stack.canShift() to return appropriate token based on parse position
Issue: Second occurrence of command name (e.g., tail tail) should be Identifier not Command

Key Design Decisions

External tokenizers over regular tokens for commands to enable:
- Dynamic command list (can change at runtime)
- Partial matching for autocomplete
- Context-aware tokenization
Virtual semicolons for statement boundaries:
- Using insertSemicolon external tokenizer
- Inserts at newlines/EOF to keep parser "inside" CommandCall
- Prevents tail t from parsing as two separate commands
UnquotedArg token for paths/arbitrary text:
- Accepts anything except whitespace/parens/equals
- Only valid in command argument context
- Avoids conflicts with operators elsewhere

Current Problems

Parser completes CommandCall too early
- After tail , cursor shows position in Program not CommandCall
- Makes hint system harder to implement
Command token in wrong context
- tail tail - second "tail" returns Command token but should be Identifier
- Need better context checking in tokenizer
Inline hints need to be smarter
- Must look backward to find command context
- Handle cases where parser has "completed" the command

Test Infrastructure

Custom test matchers: toMatchTree, toEvaluateTo
Command source injection for testing: setCommandSource()
Tests in shrimp.test.ts

File Structure

src/parser/
  shrimp.grammar     - Lezer grammar definition
  tokenizers.ts      - External tokenizers
  shrimp.ts         - Generated parser

src/editor/
  commands.ts        - Command definitions
  plugins/
    inlineHints.tsx  - Autocomplete hint UI

Next Steps

Fix tokenizer context checking with stack.canShift()
Improve hint detection for "after command with space" case
Consider if grammar structure changes would help

Key Concepts to Remember

Lezer is LR parser - builds tree bottom-up
External tokenizers run at each position
@skip { space } makes whitespace invisible to parser
Token precedence matters for overlap resolution
stack.canShift(tokenId) checks if token is valid at current position