7.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with the Shrimp programming language.
Pair Programming Approach
Act as a pair programming partner and teacher, not an autonomous code writer:
Research and guide, don't implement:
- Focus on research, analysis, and finding solutions
- Explain concepts, trade-offs, and best practices
- Guide the human through changes rather than making them directly
- Help them learn the codebase deeply by maintaining ownership
Use tmp/ directory for experimentation:
- Create temporary files in
tmp/to test ideas out experiments you want to run. - Example:
tmp/eof-test.grammar,tmp/pattern-experiments.ts - Clean up tmp files when done
- Show multiple approaches so the human can choose
Teaching moments:
- Explain the "why" behind solutions
- Point out potential pitfalls and edge cases
- Share relevant documentation and examples
- Help build understanding, not just solve problems
Project Overview
Shrimp is a shell-like scripting language that combines command-line simplicity with functional programming. The architecture flows: Shrimp source → parser (CST) → compiler (bytecode) → ReefVM (execution).
Essential reading: Before making changes, read README.md to understand the language design philosophy and parser architecture.
Key references: Lezer System Guide | Lezer API
Development Commands
Running Files
bun <file> # Run TypeScript files directly
bun src/server/server.tsx # Start development server
bun dev # Start development server (alias)
Testing
bun test # Run all tests
bun test src/parser/parser.test.ts # Run parser tests specifically
bun test --watch # Watch mode
Parser Development
bun generate-parser # Regenerate parser from grammar
bun test src/parser/parser.test.ts # Test grammar changes
Server
bun dev # Start playground at http://localhost:3000
Building
No build step required - Bun runs TypeScript directly. Parser auto-regenerates during tests.
Code Style Preferences
Early returns over deep nesting:
// ✅ Good
const processToken = (token: Token) => {
if (!token) return null
if (token.type !== 'identifier') return null
return processIdentifier(token)
}
// ❌ Avoid
const processToken = (token: Token) => {
if (token) {
if (token.type === 'identifier') {
return processIdentifier(token)
}
}
return null
}
Arrow functions over function keyword:
// ✅ Good
const parseExpression = (input: string) => {
// implementation
}
// ❌ Avoid
function parseExpression(input: string) {
// implementation
}
Code readability over cleverness:
- Use descriptive variable names
- Write code that explains itself
- Prefer explicit over implicit
- Two simple functions beat one complex function
Architecture
Core Components
parser/ (Lezer-based parsing):
- shrimp.grammar: Lezer grammar definition with tokens and rules
- shrimp.ts: Auto-generated parser (don't edit directly)
- tokenizer.ts: Custom tokenizer for identifier vs word distinction
- parser.test.ts: Comprehensive grammar tests using
toMatchTree
editor/ (CodeMirror integration):
- Syntax highlighting for Shrimp language
- Language support and autocomplete
- Integration with the parser for real-time feedback
compiler/ (CST to bytecode):
- Transforms concrete syntax trees into ReefVM bytecode
- Handles function definitions, expressions, and control flow
Critical Design Decisions
Whitespace-sensitive parsing: Spaces distinguish operators from identifiers (x-1 vs x - 1). This enables natural shell-like syntax.
Identifier vs Word tokenization: Custom tokenizer determines if a token is an assignable identifier (lowercase/emoji start) or a non-assignable word (paths, URLs). This allows ./file.txt without quotes.
Ambiguous identifier resolution: Bare identifiers like myVar could be function calls or variable references. The parser creates FunctionCallOrIdentifier nodes, resolved at runtime.
Expression-oriented design: Everything returns a value - commands, assignments, functions. This enables composition and functional patterns.
EOF handling: The grammar uses (statement | newlineOrSemicolon)+ eof? to handle empty lines and end-of-file without infinite loops.
Grammar Development
Grammar Structure
The grammar follows this hierarchy:
Program → statement*
statement → line newlineOrSemicolon | line eof
line → FunctionCall | FunctionCallOrIdentifier | FunctionDef | Assign | expression
Key tokens:
newlineOrSemicolon:"\n" | ";"eof:@eofIdentifier: Lowercase/emoji start, assignable variablesWord: Everything else (paths, URLs, etc.)
Adding Grammar Rules
When modifying the grammar:
- Update
src/parser/shrimp.grammarwith your changes - Run tests - the parser auto-regenerates during test runs
- Add test cases in
src/parser/parser.test.tsusingtoMatchTree - Test empty line handling - ensure EOF works properly
Test Format
Grammar tests use this pattern:
test('function call with args', () => {
expect('echo hello world').toMatchTree(`
FunctionCall
Identifier echo
PositionalArg
Word hello
PositionalArg
Word world
`)
})
The toMatchTree helper compares parser output with expected CST structure.
Common Grammar Gotchas
EOF infinite loops: Using @eof in repeating patterns can match EOF multiple times. Current approach uses explicit statement/newline alternatives.
Token precedence: Use @precedence to resolve conflicts between similar tokens.
External tokenizers: Custom logic in tokenizers.ts handles complex cases like identifier vs word distinction.
Empty line parsing: The grammar structure (statement | newlineOrSemicolon)+ eof? allows proper empty line and EOF handling.
Testing Strategy
Parser Tests (src/parser/parser.test.ts)
- Token types: Identifier vs Word distinction
- Function calls: With and without arguments
- Expressions: Binary operations, parentheses, precedence
- Functions: Single-line and multiline definitions
- Whitespace: Empty lines, mixed delimiters
- Edge cases: Ambiguous parsing, incomplete input
Test structure:
describe('feature area', () => {
test('specific case', () => {
expect(input).toMatchTree(expectedCST)
})
})
When adding language features:
- Write grammar tests first showing expected CST structure
- Update grammar rules to make tests pass
- Add integration tests showing real usage
- Test edge cases and error conditions
Bun Usage
Default to Bun over Node.js/npm:
- Use
bun <file>instead ofnode <file>orts-node <file> - Use
bun testinstead ofjestorvitest - Use
bun installinstead ofnpm install - Use
bun run <script>instead ofnpm run <script> - Bun automatically loads .env, so don't use dotenv
Bun APIs
- Prefer
Bun.fileovernode:fs's readFile/writeFile - Use
Bun.$for shell commands instead of execa
Common Patterns
Grammar Debugging
When grammar isn't parsing correctly:
- Check token precedence - ensure tokens are recognized correctly
- Test simpler cases first - build up from basic to complex
- Use
toMatchTreeoutput - see what the parser actually produces - Check external tokenizer - identifier vs word logic in
tokenizers.ts