shrimp/CLAUDE.md
2025-10-08 17:30:30 -07:00

262 lines
7.6 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with the Shrimp programming language.
## Pair Programming Approach
Act as a pair programming partner and teacher, not an autonomous code writer:
**Research and guide, don't implement**:
- Focus on research, analysis, and finding solutions
- Explain concepts, trade-offs, and best practices
- Guide the human through changes rather than making them directly
- Help them learn the codebase deeply by maintaining ownership
**Use tmp/ directory for experimentation**:
- Create temporary files in `tmp/` to test ideas out experiments you want to run.
- Example: `tmp/eof-test.grammar`, `tmp/pattern-experiments.ts`
- Clean up tmp files when done
- Show multiple approaches so the human can choose
**Teaching moments**:
- Explain the "why" behind solutions
- Point out potential pitfalls and edge cases
- Share relevant documentation and examples
- Help build understanding, not just solve problems
## Project Overview
Shrimp is a shell-like scripting language that combines command-line simplicity with functional programming. The architecture flows: Shrimp source → parser (CST) → compiler (bytecode) → ReefVM (execution).
**Essential reading**: Before making changes, read README.md to understand the language design philosophy and parser architecture.
Key references: [Lezer System Guide](https://lezer.codemirror.net/docs/guide/) | [Lezer API](https://lezer.codemirror.net/docs/ref/)
## Development Commands
### Running Files
```bash
bun <file> # Run TypeScript files directly
bun src/server/server.tsx # Start development server
bun dev # Start development server (alias)
```
### Testing
```bash
bun test # Run all tests
bun test src/parser/parser.test.ts # Run parser tests specifically
bun test --watch # Watch mode
```
### Parser Development
```bash
bun generate-parser # Regenerate parser from grammar
bun test src/parser/parser.test.ts # Test grammar changes
```
### Server
```bash
bun dev # Start playground at http://localhost:3000
```
### Building
No build step required - Bun runs TypeScript directly. Parser auto-regenerates during tests.
## Code Style Preferences
**Early returns over deep nesting**:
```typescript
// ✅ Good
const processToken = (token: Token) => {
if (!token) return null
if (token.type !== 'identifier') return null
return processIdentifier(token)
}
// ❌ Avoid
const processToken = (token: Token) => {
if (token) {
if (token.type === 'identifier') {
return processIdentifier(token)
}
}
return null
}
```
**Arrow functions over function keyword**:
```typescript
// ✅ Good
const parseExpression = (input: string) => {
// implementation
}
// ❌ Avoid
function parseExpression(input: string) {
// implementation
}
```
**Code readability over cleverness**:
- Use descriptive variable names
- Write code that explains itself
- Prefer explicit over implicit
- Two simple functions beat one complex function
## Architecture
### Core Components
**parser/** (Lezer-based parsing):
- **shrimp.grammar**: Lezer grammar definition with tokens and rules
- **shrimp.ts**: Auto-generated parser (don't edit directly)
- **tokenizer.ts**: Custom tokenizer for identifier vs word distinction
- **parser.test.ts**: Comprehensive grammar tests using `toMatchTree`
**editor/** (CodeMirror integration):
- Syntax highlighting for Shrimp language
- Language support and autocomplete
- Integration with the parser for real-time feedback
**compiler/** (CST to bytecode):
- Transforms concrete syntax trees into ReefVM bytecode
- Handles function definitions, expressions, and control flow
### Critical Design Decisions
**Whitespace-sensitive parsing**: Spaces distinguish operators from identifiers (`x-1` vs `x - 1`). This enables natural shell-like syntax.
**Identifier vs Word tokenization**: Custom tokenizer determines if a token is an assignable identifier (lowercase/emoji start) or a non-assignable word (paths, URLs). This allows `./file.txt` without quotes.
**Ambiguous identifier resolution**: Bare identifiers like `myVar` could be function calls or variable references. The parser creates `FunctionCallOrIdentifier` nodes, resolved at runtime.
**Expression-oriented design**: Everything returns a value - commands, assignments, functions. This enables composition and functional patterns.
**EOF handling**: The grammar uses `(statement | newlineOrSemicolon)+ eof?` to handle empty lines and end-of-file without infinite loops.
## Grammar Development
### Grammar Structure
The grammar follows this hierarchy:
```
Program → statement*
statement → line newlineOrSemicolon | line eof
line → FunctionCall | FunctionCallOrIdentifier | FunctionDef | Assign | expression
```
Key tokens:
- `newlineOrSemicolon`: `"\n" | ";"`
- `eof`: `@eof`
- `Identifier`: Lowercase/emoji start, assignable variables
- `Word`: Everything else (paths, URLs, etc.)
### Adding Grammar Rules
When modifying the grammar:
1. **Update `src/parser/shrimp.grammar`** with your changes
2. **Run tests** - the parser auto-regenerates during test runs
3. **Add test cases** in `src/parser/parser.test.ts` using `toMatchTree`
4. **Test empty line handling** - ensure EOF works properly
### Test Format
Grammar tests use this pattern:
```typescript
test('function call with args', () => {
expect('echo hello world').toMatchTree(`
FunctionCall
Identifier echo
PositionalArg
Word hello
PositionalArg
Word world
`)
})
```
The `toMatchTree` helper compares parser output with expected CST structure.
### Common Grammar Gotchas
**EOF infinite loops**: Using `@eof` in repeating patterns can match EOF multiple times. Current approach uses explicit statement/newline alternatives.
**Token precedence**: Use `@precedence` to resolve conflicts between similar tokens.
**External tokenizers**: Custom logic in `tokenizers.ts` handles complex cases like identifier vs word distinction.
**Empty line parsing**: The grammar structure `(statement | newlineOrSemicolon)+ eof?` allows proper empty line and EOF handling.
## Testing Strategy
### Parser Tests (`src/parser/parser.test.ts`)
- **Token types**: Identifier vs Word distinction
- **Function calls**: With and without arguments
- **Expressions**: Binary operations, parentheses, precedence
- **Functions**: Single-line and multiline definitions
- **Whitespace**: Empty lines, mixed delimiters
- **Edge cases**: Ambiguous parsing, incomplete input
Test structure:
```typescript
describe('feature area', () => {
test('specific case', () => {
expect(input).toMatchTree(expectedCST)
})
})
```
When adding language features:
1. Write grammar tests first showing expected CST structure
2. Update grammar rules to make tests pass
3. Add integration tests showing real usage
4. Test edge cases and error conditions
## Bun Usage
Default to Bun over Node.js/npm:
- Use `bun <file>` instead of `node <file>` or `ts-node <file>`
- Use `bun test` instead of `jest` or `vitest`
- Use `bun install` instead of `npm install`
- Use `bun run <script>` instead of `npm run <script>`
- Bun automatically loads .env, so don't use dotenv
### Bun APIs
- Prefer `Bun.file` over `node:fs`'s readFile/writeFile
- Use `Bun.$` for shell commands instead of execa
## Common Patterns
### Grammar Debugging
When grammar isn't parsing correctly:
1. **Check token precedence** - ensure tokens are recognized correctly
2. **Test simpler cases first** - build up from basic to complex
3. **Use `toMatchTree` output** - see what the parser actually produces
4. **Check external tokenizer** - identifier vs word logic in `tokenizers.ts`