Go to file
2025-10-03 14:34:02 -07:00
src wip 2025-10-03 14:34:02 -07:00
.gitattributes Initial commit 2025-09-25 20:17:27 -07:00
.gitignore Initial commit 2025-09-25 20:17:27 -07:00
build.ts Initial commit 2025-09-25 20:17:27 -07:00
bun-env.d.ts Initial commit 2025-09-25 20:17:27 -07:00
bun.lock editor 2025-09-26 16:16:09 -07:00
bunfig.toml emoji 2025-09-29 11:40:32 -07:00
CLAUDE.md Initial commit 2025-09-25 20:17:27 -07:00
package.json emoji 2025-09-29 11:40:32 -07:00
README.md wip 2025-10-03 08:15:02 -07:00
tsconfig.json OMG hairy 2025-10-02 14:05:17 -07:00

Shrimp Parser - Development Context

Overview

Building a command-line language parser using Lezer (CodeMirror's parser system) with TypeScript. The goal is to create a prototype that can parse commands with arguments, similar to shell syntax, with inline hints for autocompletion.

Current Architecture

Grammar Structure (shrimp.grammar)

  • Commands: Can be complete (Command) or partial (CommandPartial) for autocomplete
  • Arguments: Positional or named (with name=value syntax)
  • Key Challenge: Handling arbitrary text (like file paths) as arguments without conflicting with operators/keywords

Tokenizer Setup (tokenizers.ts)

  • Main tokenizer: Returns Command, CommandPartial, or Identifier based on context
  • Command matching: Uses matchCommand() to check against available commands
  • Context-aware: Uses stack.canShift() to return appropriate token based on parse position
  • Issue: Second occurrence of command name (e.g., tail tail) should be Identifier not Command

Key Design Decisions

  1. External tokenizers over regular tokens for commands to enable:

    • Dynamic command list (can change at runtime)
    • Partial matching for autocomplete
    • Context-aware tokenization
  2. Virtual semicolons for statement boundaries:

    • Using insertSemicolon external tokenizer
    • Inserts at newlines/EOF to keep parser "inside" CommandCall
    • Prevents tail t from parsing as two separate commands
  3. UnquotedArg token for paths/arbitrary text:

    • Accepts anything except whitespace/parens/equals
    • Only valid in command argument context
    • Avoids conflicts with operators elsewhere

Current Problems

  1. Parser completes CommandCall too early

    • After tail , cursor shows position in Program not CommandCall
    • Makes hint system harder to implement
  2. Command token in wrong context

    • tail tail - second "tail" returns Command token but should be Identifier
    • Need better context checking in tokenizer
  3. Inline hints need to be smarter

    • Must look backward to find command context
    • Handle cases where parser has "completed" the command

Test Infrastructure

  • Custom test matchers: toMatchTree, toEvaluateTo
  • Command source injection for testing: setCommandSource()
  • Tests in shrimp.test.ts

File Structure

src/parser/
  shrimp.grammar     - Lezer grammar definition
  tokenizers.ts      - External tokenizers
  shrimp.ts         - Generated parser

src/editor/
  commands.ts        - Command definitions
  plugins/
    inlineHints.tsx  - Autocomplete hint UI

Next Steps

  1. Fix tokenizer context checking with stack.canShift()
  2. Improve hint detection for "after command with space" case
  3. Consider if grammar structure changes would help

Key Concepts to Remember

  • Lezer is LR parser - builds tree bottom-up
  • External tokenizers run at each position
  • @skip { space } makes whitespace invisible to parser
  • Token precedence matters for overlap resolution
  • stack.canShift(tokenId) checks if token is valid at current position