sandlot/rust-sandlot/TESTING.md

# Sandlot Rust Rewrite: VM Integration Testing

This document describes how to test the Rust rewrite of sandlot against the TypeScript original. The goal is to verify **identical behavior** for every command that interacts with the VM/container, git worktrees, or session state.

## Prerequisites

- macOS on Apple Silicon
- Apple Container installed (`brew install container`)
- Rust toolchain (`rustup`)
- Bun installed (`brew install oven-sh/bun/bun`)
- An `ANTHROPIC_API_KEY` in `~/.env` (format: `ANTHROPIC_API_KEY=sk-ant-...`)
- A git repo to use as a test bed (create a throwaway one)

## Setup

### 1. Build the Rust binary

```bash
cd rust-sandlot
cargo build --release
```

The binary is at `./rust-sandlot/target/release/sandlot`.

### 2. Set up aliases

Use two distinct aliases so you can run either implementation:

```bash
alias sandlot-ts='bun run /path/to/rust-rewrite/src/cli.ts'
alias sandlot-rs='/path/to/rust-rewrite/rust-sandlot/target/release/sandlot'
```

### 3. Destroy any existing VM

Start from a clean slate. Both implementations share the same container name (`sandlot`), so only one can be tested at a time:

```bash
sandlot-ts vm destroy 2>/dev/null
```

### 4. Create a test repo

```bash
mkdir /tmp/sandlot-test-repo && cd /tmp/sandlot-test-repo
git init
echo "hello" > README.md
git add . && git commit -m "initial commit"
```

All tests below assume you run commands from inside this repo.

---

## Testing methodology

For each test:

1. Run the command with `sandlot-ts` first, observe the result
2. Clean up / reset state
3. Run the same command with `sandlot-rs`, observe the result
4. Compare: stdout content, stderr content, exit code, and side effects (files created, git state, container state)

Some commands produce animated spinner output on stderr. The final line of spinner output is what matters (the success/failure message). Intermediate spinner frames are cosmetic and may differ in timing.

When comparing output, strip ANSI codes for semantic comparison:

```bash
sandlot-rs list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
sandlot-ts list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
```

---

## Phase 1: VM Lifecycle

These tests verify container management. Run them in order.

### Test 1.1: `vm create`

```bash
sandlot-ts vm destroy 2>/dev/null  # clean slate
sandlot-rs vm create
```

**Expect:**
- Spinner output on stderr progressing through: "Creating VM" -> "Pulling image & creating container" -> "Installing packages" -> "Installing Bun" -> "Installing Claude Code" -> "Installing neofetch" -> "Installing Neovim" -> "Configuring environment"
- Final line: `✔ VM created`
- Exit code: 0

**Verify side effects:**
```bash
container list --format json --all  # should show "sandlot" container running
container exec sandlot which claude  # should print /home/ubuntu/.local/bin/claude
container exec sandlot which bun     # should print /home/ubuntu/.local/bin/bun
container exec sandlot which fish    # should print /usr/bin/fish
container exec sandlot test -f /home/ubuntu/.claude/settings.json && echo ok
container exec sandlot test -f /home/ubuntu/.claude/api-key-helper.sh && echo ok
container exec sandlot cat /home/ubuntu/.claude.json  # should have hasCompletedOnboarding: true
```

Now destroy and repeat with TS:
```bash
sandlot-rs vm destroy
sandlot-ts vm create
```
Verify the same side effects exist.

### Test 1.2: `vm status`

```bash
# With VM running:
sandlot-rs vm status
sandlot-ts vm status
```

**Expect (no sessions):**
```
VM: running    (in green)

No active sessions.   (in dim)
```

```bash
# JSON mode:
sandlot-rs vm status --json
sandlot-ts vm status --json
```

**Expect:** JSON with `"vm": "running"` and `"sessions": []`.

### Test 1.3: `vm stop`

```bash
sandlot-rs vm stop
```

**Expect:** Spinner, then `✔ VM stopped`. Exit code 0.

```bash
sandlot-rs vm status
```

**Expect:** `VM: stopped` (in yellow).

### Test 1.4: `vm start`

```bash
sandlot-rs vm start
```

**Expect:** `✔ VM started` on stdout. Exit code 0.

### Test 1.5: `vm info`

```bash
sandlot-rs vm info
sandlot-ts vm info
```

**Expect:** neofetch output (system info). Both should show identical container specs.

### Test 1.6: `vm shell`

```bash
sandlot-rs vm shell
```

**Expect:** Drops into an interactive fish shell inside the container. Type `exit` to leave. Verify the prompt works and `echo $PATH` includes the expected paths.

### Test 1.7: `vm destroy`

```bash
sandlot-rs vm destroy
```

**Expect:** Spinner, then `✔ VM destroyed`. Exit code 0.

```bash
sandlot-rs vm status
```

**Expect:** `VM: missing` (in red).

### Test 1.8: `vm create` (duplicate)

```bash
sandlot-rs vm create
# Then try again:
sandlot-rs vm create
```

**Expect second call:** Error: `Container already exists. Use 'sandlot vm destroy' first to recreate it.` Exit code 1.

### Test 1.9: `vm uncache`

```bash
sandlot-rs vm uncache
```

**Expect:** `✔ Package cache cleared` if cache existed, or `No cache to clear`.

### Test 1.10: `vm start` when missing

```bash
sandlot-rs vm destroy
sandlot-rs vm start
```

**Expect:** Error: `Container does not exist. Use 'sandlot vm create' first.` Exit code 1.

---

## Phase 2: Session Lifecycle

Ensure a VM is running before starting: `sandlot-rs vm create` (or `ensure` will auto-create).

### Test 2.1: `new` with explicit branch name

```bash
sandlot-rs new test-branch-1
# Claude launches interactively. Press Ctrl+C or /exit to quit.
```

**Expect:**
- Spinner: "Creating worktree" -> "Starting container" -> `✔ [test-branch-1] Session ready`
- Claude Code launches in the container
- After exit, auto-save runs (spinner: "Staging changes" -> either "No changes to commit" or "Saved: ...")

**Verify side effects:**
```bash
ls -la ~/.sandlot/sandlot-test-repo/test-branch-1/  # worktree exists
ls -la .sandlot/test-branch-1                         # symlink exists
cat .sandlot/state.json                                # session entry exists
git worktree list                                      # shows the worktree
```

### Test 2.2: `new` with no branch (random name)

```bash
sandlot-rs new
```

**Expect:** A random `adjective-noun` branch name is generated (e.g., `calm-fern`). The rest of the flow is identical to 2.1.

### Test 2.3: `new` with prompt (spaces in "branch")

```bash
sandlot-rs new "fix the login bug on the settings page"
```

**Expect:** The text is treated as a prompt. A branch name is derived via Claude Haiku API (e.g., `login-fix`). If the API call fails, falls back to first two words (`fix-the`). The prompt is stored in `state.json`.

### Test 2.4: `new` with `-p` (print mode)

```bash
sandlot-rs new -p "what is 2+2"
```

**Expect:**
- Branch name derived from the prompt
- Spinner: "Creating worktree" -> "Starting container" -> "Running prompt..."
- Claude's response printed to stdout (rendered as markdown)
- No interactive session
- Auto-save runs after

### Test 2.5: `new` duplicate session

```bash
sandlot-rs new test-branch-1
```

**Expect:** `✖ Session "test-branch-1" already exists. Use "sandlot open test-branch-1" to re-enter it.` Exit code 1.

### Test 2.6: `list` with sessions

```bash
sandlot-rs list
```

**Expect:**
```
  BRANCH          PROMPT
◯ test-branch-1
◯ other-branch    fix the login bug...

◯ idle · ◎ active · ◐ unsaved · ● saved · ⦿ review
```

Status icons use ANSI colors (dim for idle, cyan for active, yellow for dirty, green for saved, magenta for review).

```bash
sandlot-rs list --json
```

**Expect:** JSON array with each session having `branch`, `worktree`, `created_at`, `prompt`, `in_review`, `status`, `repoRoot` fields.

### Test 2.7: `open` existing session

```bash
sandlot-rs open test-branch-1
```

**Expect:**
- Spinner: "Starting container" -> `✔ [test-branch-1] Session ready`
- Claude launches with `--continue` (resumes prior conversation)
- After exit, auto-save runs

### Test 2.8: `open` with `--no-save`

```bash
sandlot-rs open test-branch-1 --no-save
```

**Expect:** Same as 2.7 but no auto-save after Claude exits.

### Test 2.9: `open` nonexistent session but existing branch

If you manually create a branch and remove the session from state.json, `open` should recreate the session:

```bash
# Remove from state but keep the branch
cat .sandlot/state.json  # note the session
# Manually edit state.json to remove the session entry
sandlot-rs open test-branch-1
```

**Expect:** Worktree is recreated, session is re-added to state, Claude launches.

### Test 2.10: `open` nonexistent branch

```bash
sandlot-rs open nonexistent-branch-xyz
```

**Expect:** `✖ No session or branch found for "nonexistent-branch-xyz".` Exit code 1.

---

## Phase 3: Branch Operations (read-only)

These commands read git state without modifying it. Create a session with some commits first:

```bash
sandlot-rs new branch-ops-test
# Inside Claude, make some changes and commit, then exit
# Or manually:
cd ~/.sandlot/sandlot-test-repo/branch-ops-test
echo "new file" > test.txt
git add . && git commit -m "add test file"
cd /tmp/sandlot-test-repo
```

### Test 3.1: `diff`

```bash
sandlot-rs diff branch-ops-test
```

**Expect:**
- If uncommitted changes in worktree: shows `git diff HEAD`
- If clean: shows `git diff main...branch-ops-test`
- Output piped through git's native diff display (with colors if terminal supports)

Compare with:
```bash
sandlot-ts diff branch-ops-test
```

### Test 3.2: `log`

```bash
sandlot-rs log branch-ops-test
```

**Expect:**
- If the session has a prompt, prints `PROMPT: <text>` to stderr first
- Shows `git log main..HEAD` output with commit hashes highlighted in yellow
- Piped through pager if output exceeds terminal height

### Test 3.3: `show`

```bash
sandlot-rs show branch-ops-test
```

**Expect:**
- Prints prompt to stderr (if stored)
- Shows full `git diff main...branch` output on stdout

### Test 3.4: `web`

```bash
sandlot-rs web branch-ops-test
```

**Expect:**
- Generates `/tmp/sandlot-branch-ops-test.html`
- Opens it in the default browser
- HTML contains: branch name, prompt, commit log, diff stats, syntax-highlighted diff

**Verify:** Open the generated HTML file and compare it with the one generated by `sandlot-ts web branch-ops-test`.

### Test 3.5: `dir`

```bash
sandlot-rs dir branch-ops-test
```

**Expect:** Prints the absolute worktree path to stdout, e.g., `/Users/you/.sandlot/sandlot-test-repo/branch-ops-test`.

### Test 3.6: `dir` nonexistent session

```bash
sandlot-rs dir nonexistent
```

**Expect:** `✖ No session found for branch "nonexistent".` Exit code 1.

---

## Phase 4: Save, Merge, Squash, Rebase

### Test 4.1: `save` with auto-generated message

```bash
# Make changes in the worktree first:
echo "change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test
```

**Expect:**
- Spinner: `[branch-ops-test] Staging changes` -> `Starting container` -> `Generating commit message` -> `Committing` -> `✔ [branch-ops-test] Saved: <commit message>`
- The commit message is AI-generated from the diff

### Test 4.2: `save` with explicit message

```bash
echo "another change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test "manual commit message"
```

**Expect:**
- Spinner: `Staging changes` -> `Committing` -> `✔ [branch-ops-test] Saved: manual commit message`
- No AI generation (no container startup needed for the message)

### Test 4.3: `save` with no changes

```bash
sandlot-rs save branch-ops-test
```

**Expect:** `✖ [branch-ops-test] No changes to commit`. Exit code 1.

### Test 4.4: `squash`

```bash
# Ensure branch has multiple commits beyond main
sandlot-rs squash branch-ops-test
```

**Expect:**
- Spinner: `[branch-ops-test] Squashing` -> `Starting container` -> `Generating commit message` -> `✔ [branch-ops-test] Squashed branch-ops-test into a single commit`
- `git log main..HEAD` in the worktree should show exactly 1 commit

### Test 4.5: `squash` with no commits

```bash
sandlot-rs new fresh-branch
# Exit Claude immediately without making changes
sandlot-rs squash fresh-branch
```

**Expect:** `✖ Branch "fresh-branch" has no commits beyond main.` Exit code 1.

### Test 4.6: `squash` with dirty worktree

```bash
echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs squash branch-ops-test
```

**Expect:** `✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first.` Exit code 1.

### Test 4.7: `rebase`

Set up a scenario where main has advanced:

```bash
# In the main repo, add a commit to main
cd /tmp/sandlot-test-repo
echo "main change" > main-file.txt
git add . && git commit -m "advance main"

sandlot-rs rebase branch-ops-test
```

**Expect (clean rebase):**
- Spinner: `[branch-ops-test] Fetching origin` -> `Rebasing onto origin/main` -> `✔ [branch-ops-test] Rebased branch-ops-test onto main`

**Expect (with conflicts):**
- `◆ Rebase conflicts in N file(s). Resolving with Claude...`
- Spinner: `[branch-ops-test] Starting container` -> `(1/N) Resolving <file> (round 1)` -> `✔ [branch-ops-test] Rebased branch-ops-test onto main (resolved N conflict round(s))`

### Test 4.8: `rebase` with dirty worktree

```bash
echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs rebase branch-ops-test
```

**Expect:** `✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first.` Exit code 1.

### Test 4.9: `merge`

```bash
cd /tmp/sandlot-test-repo
git checkout main
sandlot-rs merge branch-ops-test
```

**Expect (clean merge):**
- Spinner: `Merging branch-ops-test` -> `✔ Merged branch-ops-test into main`
- Session is torn down (worktree removed, symlink removed, state cleared)
- Local branch is deleted

**Expect (with conflicts):**
- Spinner: `Resolving N conflict(s)` -> `Starting container` -> `(1/N) Resolving <file>` -> `✔ Resolved N conflict(s) and merged branch-ops-test`
- Same cleanup as clean merge

### Test 4.10: `merge` not on main

```bash
git checkout -b other-branch
sandlot-rs merge some-branch
```

**Expect:** `✖ You must be on "main" to merge. Currently on "other-branch". Use --force to merge into "other-branch" anyway.` Exit code 1.

### Test 4.11: `merge --force` on non-main

```bash
sandlot-rs merge some-branch --force
```

**Expect:** Merge proceeds into `other-branch` instead of `main`.

### Test 4.12: `merge` with dirty session

```bash
echo "dirty" >> ~/.sandlot/sandlot-test-repo/some-branch/file.txt
sandlot-rs merge some-branch
```

**Expect:** `✖ Branch "some-branch" has unsaved changes. Run "sandlot save some-branch" first.` Exit code 1.

---

## Phase 5: Review

### Test 5.1: `review` interactive

```bash
sandlot-rs review branch-ops-test
```

**Expect:**
- Spinner: `[branch-ops-test] Starting container` -> `✔ [branch-ops-test] Session ready`
- Claude launches with the review prompt (4-agent grumpy senior engineer review)
- `state.json` shows `in_review: true` during the review
- After exit: `in_review` is cleared, auto-save runs

### Test 5.2: `review --print`

```bash
sandlot-rs review branch-ops-test --print
```

**Expect:**
- Spinner: `[branch-ops-test] Starting container` -> `Running review...`
- Review output printed to stdout (not interactive)
- No auto-save after

### Test 5.3: `review` with extra prompt

```bash
sandlot-rs review branch-ops-test "also check for SQL injection"
```

**Expect:** The extra text is appended to the review prompt. Claude receives both the standard review instructions and the additional context.

---

## Phase 6: Shell and Edit

### Test 6.1: `shell` with branch

```bash
sandlot-rs shell branch-ops-test
```

**Expect:** Interactive fish shell opens in the worktree directory inside the container. `pwd` should show the container-translated worktree path.

### Test 6.2: `shell` without branch

```bash
sandlot-rs shell
```

**Expect:** Interactive fish shell opens at a default location (no `--workdir` flag).

### Test 6.3: `edit`

```bash
export EDITOR=vim
sandlot-rs edit branch-ops-test test.txt
```

**Expect:** vim opens the file at the worktree path. After closing, exits cleanly.

### Test 6.4: `edit` with missing EDITOR

```bash
unset EDITOR
sandlot-rs edit branch-ops-test test.txt
```

**Expect:** `✖ $EDITOR is not set.` Exit code 1.

### Test 6.5: `edit` with missing file

```bash
export EDITOR=vim
sandlot-rs edit branch-ops-test nonexistent.txt
```

**Expect:** `✖ File not found: nonexistent.txt` Exit code 1.

### Test 6.6: `edit` path escape attempt

```bash
sandlot-rs edit branch-ops-test ../../etc/passwd
```

**Expect:** Error (path escapes the worktree). The exact message may vary but should prevent access.

---

## Phase 7: Close and Checkout

### Test 7.1: `close` clean session

```bash
sandlot-rs close test-branch-1
```

**Expect:**
- `✔ Closed session test-branch-1` on stdout
- Worktree removed from `~/.sandlot/...`
- Symlink removed from `.sandlot/test-branch-1`
- Session removed from `state.json`
- Local branch deleted
- Exit code 0

### Test 7.2: `close` dirty session

```bash
# Set up a dirty session first
sandlot-rs new dirty-test
echo "uncommitted" > ~/.sandlot/sandlot-test-repo/dirty-test/uncommitted.txt
sandlot-rs close dirty-test
```

**Expect:** `✖ Branch "dirty-test" has unsaved changes. Run "sandlot save dirty-test" first, or use -f to force.` Exit code 1.

### Test 7.3: `close --force` dirty session

```bash
sandlot-rs close dirty-test --force
```

**Expect:** `✔ Closed session dirty-test`. Session is torn down despite uncommitted changes.

### Test 7.4: `rm` alias

```bash
sandlot-rs rm some-branch
```

**Expect:** Identical to `close`. The `rm` command is a hidden alias.

### Test 7.5: `close` nonexistent session

```bash
sandlot-rs close nonexistent-xyz
```

**Expect:** `✖ No session found for branch "nonexistent-xyz".` Exit code 1.

### Test 7.6: `checkout`

```bash
sandlot-rs new checkout-test
# Make a commit
echo "data" > ~/.sandlot/sandlot-test-repo/checkout-test/data.txt
cd ~/.sandlot/sandlot-test-repo/checkout-test && git add . && git commit -m "data"
cd /tmp/sandlot-test-repo
sandlot-rs checkout checkout-test
```

**Expect:**
- `✔ Checked out checkout-test`
- Session torn down (worktree, symlink, state removed)
- `git branch` in main repo shows you're now on `checkout-test`
- Branch is NOT deleted (unlike `close` and `merge`)

### Test 7.7: `checkout` with dirty main worktree

```bash
echo "dirty" > /tmp/sandlot-test-repo/dirty.txt
sandlot-rs checkout some-branch
```

**Expect:** `✖ Working tree has uncommitted changes that may conflict with checkout. Commit or stash them first, or use -f to force.` Exit code 1.

### Test 7.8: `checkout --force` with dirty main worktree

```bash
sandlot-rs checkout some-branch --force
```

**Expect:** Proceeds despite dirty working tree.

---

## Phase 8: Cleanup and Upgrade

### Test 8.1: `cleanup` with stale sessions

```bash
# Create a session, then manually delete the worktree
sandlot-rs new stale-test
rm -rf ~/.sandlot/sandlot-test-repo/stale-test
sandlot-rs cleanup
```

**Expect:** `✔ Removed stale session: stale-test`. Session removed from state.json.

### Test 8.2: `cleanup` with no stale sessions

```bash
sandlot-rs cleanup
```

**Expect:** `No stale sessions found.` (or `No sessions to clean up.` if no sessions at all).

### Test 8.3: `upgrade`

```bash
sandlot-rs upgrade
```

**Expect:** Attempts to upgrade sandlot. Compare behavior with `sandlot-ts upgrade`. Both should attempt the same upgrade mechanism.

---

## Phase 9: List Status Resolution

This tests that `list` correctly resolves session status.

### Test 9.1: Idle session

```bash
sandlot-rs new idle-test
# Exit Claude immediately, no changes
sandlot-rs list
```

**Expect:** `idle-test` shows `◯` (dim circle) = idle.

### Test 9.2: Dirty session

```bash
echo "dirty" > ~/.sandlot/sandlot-test-repo/idle-test/dirty.txt
sandlot-rs list
```

**Expect:** `idle-test` shows `◐` (yellow half-circle) = unsaved.

### Test 9.3: Saved session

```bash
cd ~/.sandlot/sandlot-test-repo/idle-test
git add . && git commit -m "save"
cd /tmp/sandlot-test-repo
sandlot-rs list
```

**Expect:** `idle-test` shows `●` (green circle) = saved.

### Test 9.4: `list --all`

```bash
sandlot-rs list --all
```

**Expect:** Sessions grouped by repo name with headers:
```
── repo-name ──
  BRANCH  PROMPT
◯ branch  prompt text
```

### Test 9.5: `list` with no sessions

```bash
# Close all sessions first
sandlot-rs list
```

**Expect:** `◆ No active sessions.`

### Test 9.6: `list` with VM down

```bash
sandlot-rs vm stop
sandlot-rs list
```

**Expect:** Normal session list (all show as idle since VM can't check status), plus:
```
VM is not running.    (in red)
```

---

## Phase 10: End-to-End Comparison

For each command tested above, run the same scenario with both `sandlot-ts` and `sandlot-rs` and compare:

1. **Exit codes** must be identical
2. **Stdout content** must be semantically identical (exact match after stripping ANSI if formatting differs)
3. **Stderr content** must match (error messages, spinner final lines)
4. **Side effects** must match:
   - Same files created/deleted
   - Same git state (branches, worktrees, commits)
   - Same state.json content (modulo timestamps)
   - Same container state

### Comparison script

```bash
#!/bin/bash
# Compare a command between TS and Rust
CMD="$@"
echo "=== TypeScript ==="
sandlot-ts $CMD 2>/tmp/ts-stderr; TS_EXIT=$?
echo "EXIT: $TS_EXIT"
cat /tmp/ts-stderr

echo ""
echo "=== Rust ==="
sandlot-rs $CMD 2>/tmp/rs-stderr; RS_EXIT=$?
echo "EXIT: $RS_EXIT"
cat /tmp/rs-stderr

echo ""
if [ "$TS_EXIT" = "$RS_EXIT" ]; then
  echo "EXIT CODES: MATCH ($TS_EXIT)"
else
  echo "EXIT CODES: MISMATCH (ts=$TS_EXIT rs=$RS_EXIT)"
fi
```

---

## Known Differences to Accept

- **Timestamps** in `state.json` will differ between runs (different `created_at` values). Compare structure and non-timestamp fields only.
- **Spinner frame timing** may differ slightly. Only compare the final spinner message.
- **AI-generated content** (branch names from prompts, commit messages, conflict resolutions, reviews) will differ between runs since they involve LLM calls. Verify the format is correct, not the exact text.
- **Random branch names** from `sandlot new` (no args) will differ. Verify the format is `adjective-noun` from the same word lists.
- **Order of JSON object keys** may differ between serde_json (Rust) and JSON.stringify (TS). Compare semantically.

## What Must Be Identical

- All error messages (exact wording, Unicode markers)
- Exit codes for all error and success paths
- File paths (worktree locations, symlink targets, state file location)
- Git operations (same branches created/deleted, same merge behavior)
- Container commands (same `container exec` invocations, same environment variables)
- Flag parsing (`-f`, `--force`, `-p`, `--print`, `-n`, `--no-save`, `--json`, `-a`, `--all`)
- Default behavior (no args = `list`)
- Shell init output (`init fish`, `init bash`, `init zsh`) -- these were already verified byte-for-byte identical
- Fish/bash/zsh completions -- already verified byte-for-byte identical