23 KiB
Sandlot Rust Rewrite: VM Integration Testing
This document describes how to test the Rust rewrite of sandlot against the TypeScript original. The goal is to verify identical behavior for every command that interacts with the VM/container, git worktrees, or session state.
Prerequisites
- macOS on Apple Silicon
- Apple Container installed (
brew install container) - Rust toolchain (
rustup) - Bun installed (
brew install oven-sh/bun/bun) - An
ANTHROPIC_API_KEYin~/.env(format:ANTHROPIC_API_KEY=sk-ant-...) - A git repo to use as a test bed (create a throwaway one)
Setup
1. Build the Rust binary
cd rust-sandlot
cargo build --release
The binary is at ./rust-sandlot/target/release/sandlot.
2. Set up aliases
Use two distinct aliases so you can run either implementation:
alias sandlot-ts='bun run /path/to/rust-rewrite/src/cli.ts'
alias sandlot-rs='/path/to/rust-rewrite/rust-sandlot/target/release/sandlot'
3. Destroy any existing VM
Start from a clean slate. Both implementations share the same container name (sandlot), so only one can be tested at a time:
sandlot-ts vm destroy 2>/dev/null
4. Create a test repo
mkdir /tmp/sandlot-test-repo && cd /tmp/sandlot-test-repo
git init
echo "hello" > README.md
git add . && git commit -m "initial commit"
All tests below assume you run commands from inside this repo.
Testing methodology
For each test:
- Run the command with
sandlot-tsfirst, observe the result - Clean up / reset state
- Run the same command with
sandlot-rs, observe the result - Compare: stdout content, stderr content, exit code, and side effects (files created, git state, container state)
Some commands produce animated spinner output on stderr. The final line of spinner output is what matters (the success/failure message). Intermediate spinner frames are cosmetic and may differ in timing.
When comparing output, strip ANSI codes for semantic comparison:
sandlot-rs list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
sandlot-ts list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
Phase 1: VM Lifecycle
These tests verify container management. Run them in order.
Test 1.1: vm create
sandlot-ts vm destroy 2>/dev/null # clean slate
sandlot-rs vm create
Expect:
- Spinner output on stderr progressing through: "Creating VM" -> "Pulling image & creating container" -> "Installing packages" -> "Installing Bun" -> "Installing Claude Code" -> "Installing neofetch" -> "Installing Neovim" -> "Configuring environment"
- Final line:
✔ VM created - Exit code: 0
Verify side effects:
container list --format json --all # should show "sandlot" container running
container exec sandlot which claude # should print /home/ubuntu/.local/bin/claude
container exec sandlot which bun # should print /home/ubuntu/.local/bin/bun
container exec sandlot which fish # should print /usr/bin/fish
container exec sandlot test -f /home/ubuntu/.claude/settings.json && echo ok
container exec sandlot test -f /home/ubuntu/.claude/api-key-helper.sh && echo ok
container exec sandlot cat /home/ubuntu/.claude.json # should have hasCompletedOnboarding: true
Now destroy and repeat with TS:
sandlot-rs vm destroy
sandlot-ts vm create
Verify the same side effects exist.
Test 1.2: vm status
# With VM running:
sandlot-rs vm status
sandlot-ts vm status
Expect (no sessions):
VM: running (in green)
No active sessions. (in dim)
# JSON mode:
sandlot-rs vm status --json
sandlot-ts vm status --json
Expect: JSON with "vm": "running" and "sessions": [].
Test 1.3: vm stop
sandlot-rs vm stop
Expect: Spinner, then ✔ VM stopped. Exit code 0.
sandlot-rs vm status
Expect: VM: stopped (in yellow).
Test 1.4: vm start
sandlot-rs vm start
Expect: ✔ VM started on stdout. Exit code 0.
Test 1.5: vm info
sandlot-rs vm info
sandlot-ts vm info
Expect: neofetch output (system info). Both should show identical container specs.
Test 1.6: vm shell
sandlot-rs vm shell
Expect: Drops into an interactive fish shell inside the container. Type exit to leave. Verify the prompt works and echo $PATH includes the expected paths.
Test 1.7: vm destroy
sandlot-rs vm destroy
Expect: Spinner, then ✔ VM destroyed. Exit code 0.
sandlot-rs vm status
Expect: VM: missing (in red).
Test 1.8: vm create (duplicate)
sandlot-rs vm create
# Then try again:
sandlot-rs vm create
Expect second call: Error: Container already exists. Use 'sandlot vm destroy' first to recreate it. Exit code 1.
Test 1.9: vm uncache
sandlot-rs vm uncache
Expect: ✔ Package cache cleared if cache existed, or No cache to clear.
Test 1.10: vm start when missing
sandlot-rs vm destroy
sandlot-rs vm start
Expect: Error: Container does not exist. Use 'sandlot vm create' first. Exit code 1.
Phase 2: Session Lifecycle
Ensure a VM is running before starting: sandlot-rs vm create (or ensure will auto-create).
Test 2.1: new with explicit branch name
sandlot-rs new test-branch-1
# Claude launches interactively. Press Ctrl+C or /exit to quit.
Expect:
- Spinner: "Creating worktree" -> "Starting container" ->
✔ [test-branch-1] Session ready - Claude Code launches in the container
- After exit, auto-save runs (spinner: "Staging changes" -> either "No changes to commit" or "Saved: ...")
Verify side effects:
ls -la ~/.sandlot/sandlot-test-repo/test-branch-1/ # worktree exists
ls -la .sandlot/test-branch-1 # symlink exists
cat .sandlot/state.json # session entry exists
git worktree list # shows the worktree
Test 2.2: new with no branch (random name)
sandlot-rs new
Expect: A random adjective-noun branch name is generated (e.g., calm-fern). The rest of the flow is identical to 2.1.
Test 2.3: new with prompt (spaces in "branch")
sandlot-rs new "fix the login bug on the settings page"
Expect: The text is treated as a prompt. A branch name is derived via Claude Haiku API (e.g., login-fix). If the API call fails, falls back to first two words (fix-the). The prompt is stored in state.json.
Test 2.4: new with -p (print mode)
sandlot-rs new -p "what is 2+2"
Expect:
- Branch name derived from the prompt
- Spinner: "Creating worktree" -> "Starting container" -> "Running prompt..."
- Claude's response printed to stdout (rendered as markdown)
- No interactive session
- Auto-save runs after
Test 2.5: new duplicate session
sandlot-rs new test-branch-1
Expect: ✖ Session "test-branch-1" already exists. Use "sandlot open test-branch-1" to re-enter it. Exit code 1.
Test 2.6: list with sessions
sandlot-rs list
Expect:
BRANCH PROMPT
◯ test-branch-1
◯ other-branch fix the login bug...
◯ idle · ◎ active · ◐ unsaved · ● saved · ⦿ review
Status icons use ANSI colors (dim for idle, cyan for active, yellow for dirty, green for saved, magenta for review).
sandlot-rs list --json
Expect: JSON array with each session having branch, worktree, created_at, prompt, in_review, status, repoRoot fields.
Test 2.7: open existing session
sandlot-rs open test-branch-1
Expect:
- Spinner: "Starting container" ->
✔ [test-branch-1] Session ready - Claude launches with
--continue(resumes prior conversation) - After exit, auto-save runs
Test 2.8: open with --no-save
sandlot-rs open test-branch-1 --no-save
Expect: Same as 2.7 but no auto-save after Claude exits.
Test 2.9: open nonexistent session but existing branch
If you manually create a branch and remove the session from state.json, open should recreate the session:
# Remove from state but keep the branch
cat .sandlot/state.json # note the session
# Manually edit state.json to remove the session entry
sandlot-rs open test-branch-1
Expect: Worktree is recreated, session is re-added to state, Claude launches.
Test 2.10: open nonexistent branch
sandlot-rs open nonexistent-branch-xyz
Expect: ✖ No session or branch found for "nonexistent-branch-xyz". Exit code 1.
Phase 3: Branch Operations (read-only)
These commands read git state without modifying it. Create a session with some commits first:
sandlot-rs new branch-ops-test
# Inside Claude, make some changes and commit, then exit
# Or manually:
cd ~/.sandlot/sandlot-test-repo/branch-ops-test
echo "new file" > test.txt
git add . && git commit -m "add test file"
cd /tmp/sandlot-test-repo
Test 3.1: diff
sandlot-rs diff branch-ops-test
Expect:
- If uncommitted changes in worktree: shows
git diff HEAD - If clean: shows
git diff main...branch-ops-test - Output piped through git's native diff display (with colors if terminal supports)
Compare with:
sandlot-ts diff branch-ops-test
Test 3.2: log
sandlot-rs log branch-ops-test
Expect:
- If the session has a prompt, prints
PROMPT: <text>to stderr first - Shows
git log main..HEADoutput with commit hashes highlighted in yellow - Piped through pager if output exceeds terminal height
Test 3.3: show
sandlot-rs show branch-ops-test
Expect:
- Prints prompt to stderr (if stored)
- Shows full
git diff main...branchoutput on stdout
Test 3.4: web
sandlot-rs web branch-ops-test
Expect:
- Generates
/tmp/sandlot-branch-ops-test.html - Opens it in the default browser
- HTML contains: branch name, prompt, commit log, diff stats, syntax-highlighted diff
Verify: Open the generated HTML file and compare it with the one generated by sandlot-ts web branch-ops-test.
Test 3.5: dir
sandlot-rs dir branch-ops-test
Expect: Prints the absolute worktree path to stdout, e.g., /Users/you/.sandlot/sandlot-test-repo/branch-ops-test.
Test 3.6: dir nonexistent session
sandlot-rs dir nonexistent
Expect: ✖ No session found for branch "nonexistent". Exit code 1.
Phase 4: Save, Merge, Squash, Rebase
Test 4.1: save with auto-generated message
# Make changes in the worktree first:
echo "change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test
Expect:
- Spinner:
[branch-ops-test] Staging changes->Starting container->Generating commit message->Committing->✔ [branch-ops-test] Saved: <commit message> - The commit message is AI-generated from the diff
Test 4.2: save with explicit message
echo "another change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test "manual commit message"
Expect:
- Spinner:
Staging changes->Committing->✔ [branch-ops-test] Saved: manual commit message - No AI generation (no container startup needed for the message)
Test 4.3: save with no changes
sandlot-rs save branch-ops-test
Expect: ✖ [branch-ops-test] No changes to commit. Exit code 1.
Test 4.4: squash
# Ensure branch has multiple commits beyond main
sandlot-rs squash branch-ops-test
Expect:
- Spinner:
[branch-ops-test] Squashing->Starting container->Generating commit message->✔ [branch-ops-test] Squashed branch-ops-test into a single commit git log main..HEADin the worktree should show exactly 1 commit
Test 4.5: squash with no commits
sandlot-rs new fresh-branch
# Exit Claude immediately without making changes
sandlot-rs squash fresh-branch
Expect: ✖ Branch "fresh-branch" has no commits beyond main. Exit code 1.
Test 4.6: squash with dirty worktree
echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs squash branch-ops-test
Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.
Test 4.7: rebase
Set up a scenario where main has advanced:
# In the main repo, add a commit to main
cd /tmp/sandlot-test-repo
echo "main change" > main-file.txt
git add . && git commit -m "advance main"
sandlot-rs rebase branch-ops-test
Expect (clean rebase):
- Spinner:
[branch-ops-test] Fetching origin->Rebasing onto origin/main->✔ [branch-ops-test] Rebased branch-ops-test onto main
Expect (with conflicts):
◆ Rebase conflicts in N file(s). Resolving with Claude...- Spinner:
[branch-ops-test] Starting container->(1/N) Resolving <file> (round 1)->✔ [branch-ops-test] Rebased branch-ops-test onto main (resolved N conflict round(s))
Test 4.8: rebase with dirty worktree
echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs rebase branch-ops-test
Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.
Test 4.9: merge
cd /tmp/sandlot-test-repo
git checkout main
sandlot-rs merge branch-ops-test
Expect (clean merge):
- Spinner:
Merging branch-ops-test->✔ Merged branch-ops-test into main - Session is torn down (worktree removed, symlink removed, state cleared)
- Local branch is deleted
Expect (with conflicts):
- Spinner:
Resolving N conflict(s)->Starting container->(1/N) Resolving <file>->✔ Resolved N conflict(s) and merged branch-ops-test - Same cleanup as clean merge
Test 4.10: merge not on main
git checkout -b other-branch
sandlot-rs merge some-branch
Expect: ✖ You must be on "main" to merge. Currently on "other-branch". Use --force to merge into "other-branch" anyway. Exit code 1.
Test 4.11: merge --force on non-main
sandlot-rs merge some-branch --force
Expect: Merge proceeds into other-branch instead of main.
Test 4.12: merge with dirty session
echo "dirty" >> ~/.sandlot/sandlot-test-repo/some-branch/file.txt
sandlot-rs merge some-branch
Expect: ✖ Branch "some-branch" has unsaved changes. Run "sandlot save some-branch" first. Exit code 1.
Phase 5: Review
Test 5.1: review interactive
sandlot-rs review branch-ops-test
Expect:
- Spinner:
[branch-ops-test] Starting container->✔ [branch-ops-test] Session ready - Claude launches with the review prompt (4-agent grumpy senior engineer review)
state.jsonshowsin_review: trueduring the review- After exit:
in_reviewis cleared, auto-save runs
Test 5.2: review --print
sandlot-rs review branch-ops-test --print
Expect:
- Spinner:
[branch-ops-test] Starting container->Running review... - Review output printed to stdout (not interactive)
- No auto-save after
Test 5.3: review with extra prompt
sandlot-rs review branch-ops-test "also check for SQL injection"
Expect: The extra text is appended to the review prompt. Claude receives both the standard review instructions and the additional context.
Phase 6: Shell and Edit
Test 6.1: shell with branch
sandlot-rs shell branch-ops-test
Expect: Interactive fish shell opens in the worktree directory inside the container. pwd should show the container-translated worktree path.
Test 6.2: shell without branch
sandlot-rs shell
Expect: Interactive fish shell opens at a default location (no --workdir flag).
Test 6.3: edit
export EDITOR=vim
sandlot-rs edit branch-ops-test test.txt
Expect: vim opens the file at the worktree path. After closing, exits cleanly.
Test 6.4: edit with missing EDITOR
unset EDITOR
sandlot-rs edit branch-ops-test test.txt
Expect: ✖ $EDITOR is not set. Exit code 1.
Test 6.5: edit with missing file
export EDITOR=vim
sandlot-rs edit branch-ops-test nonexistent.txt
Expect: ✖ File not found: nonexistent.txt Exit code 1.
Test 6.6: edit path escape attempt
sandlot-rs edit branch-ops-test ../../etc/passwd
Expect: Error (path escapes the worktree). The exact message may vary but should prevent access.
Phase 7: Close and Checkout
Test 7.1: close clean session
sandlot-rs close test-branch-1
Expect:
✔ Closed session test-branch-1on stdout- Worktree removed from
~/.sandlot/... - Symlink removed from
.sandlot/test-branch-1 - Session removed from
state.json - Local branch deleted
- Exit code 0
Test 7.2: close dirty session
# Set up a dirty session first
sandlot-rs new dirty-test
echo "uncommitted" > ~/.sandlot/sandlot-test-repo/dirty-test/uncommitted.txt
sandlot-rs close dirty-test
Expect: ✖ Branch "dirty-test" has unsaved changes. Run "sandlot save dirty-test" first, or use -f to force. Exit code 1.
Test 7.3: close --force dirty session
sandlot-rs close dirty-test --force
Expect: ✔ Closed session dirty-test. Session is torn down despite uncommitted changes.
Test 7.4: rm alias
sandlot-rs rm some-branch
Expect: Identical to close. The rm command is a hidden alias.
Test 7.5: close nonexistent session
sandlot-rs close nonexistent-xyz
Expect: ✖ No session found for branch "nonexistent-xyz". Exit code 1.
Test 7.6: checkout
sandlot-rs new checkout-test
# Make a commit
echo "data" > ~/.sandlot/sandlot-test-repo/checkout-test/data.txt
cd ~/.sandlot/sandlot-test-repo/checkout-test && git add . && git commit -m "data"
cd /tmp/sandlot-test-repo
sandlot-rs checkout checkout-test
Expect:
✔ Checked out checkout-test- Session torn down (worktree, symlink, state removed)
git branchin main repo shows you're now oncheckout-test- Branch is NOT deleted (unlike
closeandmerge)
Test 7.7: checkout with dirty main worktree
echo "dirty" > /tmp/sandlot-test-repo/dirty.txt
sandlot-rs checkout some-branch
Expect: ✖ Working tree has uncommitted changes that may conflict with checkout. Commit or stash them first, or use -f to force. Exit code 1.
Test 7.8: checkout --force with dirty main worktree
sandlot-rs checkout some-branch --force
Expect: Proceeds despite dirty working tree.
Phase 8: Cleanup and Upgrade
Test 8.1: cleanup with stale sessions
# Create a session, then manually delete the worktree
sandlot-rs new stale-test
rm -rf ~/.sandlot/sandlot-test-repo/stale-test
sandlot-rs cleanup
Expect: ✔ Removed stale session: stale-test. Session removed from state.json.
Test 8.2: cleanup with no stale sessions
sandlot-rs cleanup
Expect: No stale sessions found. (or No sessions to clean up. if no sessions at all).
Test 8.3: upgrade
sandlot-rs upgrade
Expect: Attempts to upgrade sandlot. Compare behavior with sandlot-ts upgrade. Both should attempt the same upgrade mechanism.
Phase 9: List Status Resolution
This tests that list correctly resolves session status.
Test 9.1: Idle session
sandlot-rs new idle-test
# Exit Claude immediately, no changes
sandlot-rs list
Expect: idle-test shows ◯ (dim circle) = idle.
Test 9.2: Dirty session
echo "dirty" > ~/.sandlot/sandlot-test-repo/idle-test/dirty.txt
sandlot-rs list
Expect: idle-test shows ◐ (yellow half-circle) = unsaved.
Test 9.3: Saved session
cd ~/.sandlot/sandlot-test-repo/idle-test
git add . && git commit -m "save"
cd /tmp/sandlot-test-repo
sandlot-rs list
Expect: idle-test shows ● (green circle) = saved.
Test 9.4: list --all
sandlot-rs list --all
Expect: Sessions grouped by repo name with headers:
── repo-name ──
BRANCH PROMPT
◯ branch prompt text
Test 9.5: list with no sessions
# Close all sessions first
sandlot-rs list
Expect: ◆ No active sessions.
Test 9.6: list with VM down
sandlot-rs vm stop
sandlot-rs list
Expect: Normal session list (all show as idle since VM can't check status), plus:
VM is not running. (in red)
Phase 10: End-to-End Comparison
For each command tested above, run the same scenario with both sandlot-ts and sandlot-rs and compare:
- Exit codes must be identical
- Stdout content must be semantically identical (exact match after stripping ANSI if formatting differs)
- Stderr content must match (error messages, spinner final lines)
- Side effects must match:
- Same files created/deleted
- Same git state (branches, worktrees, commits)
- Same state.json content (modulo timestamps)
- Same container state
Comparison script
#!/bin/bash
# Compare a command between TS and Rust
CMD="$@"
echo "=== TypeScript ==="
sandlot-ts $CMD 2>/tmp/ts-stderr; TS_EXIT=$?
echo "EXIT: $TS_EXIT"
cat /tmp/ts-stderr
echo ""
echo "=== Rust ==="
sandlot-rs $CMD 2>/tmp/rs-stderr; RS_EXIT=$?
echo "EXIT: $RS_EXIT"
cat /tmp/rs-stderr
echo ""
if [ "$TS_EXIT" = "$RS_EXIT" ]; then
echo "EXIT CODES: MATCH ($TS_EXIT)"
else
echo "EXIT CODES: MISMATCH (ts=$TS_EXIT rs=$RS_EXIT)"
fi
Known Differences to Accept
- Timestamps in
state.jsonwill differ between runs (differentcreated_atvalues). Compare structure and non-timestamp fields only. - Spinner frame timing may differ slightly. Only compare the final spinner message.
- AI-generated content (branch names from prompts, commit messages, conflict resolutions, reviews) will differ between runs since they involve LLM calls. Verify the format is correct, not the exact text.
- Random branch names from
sandlot new(no args) will differ. Verify the format isadjective-nounfrom the same word lists. - Order of JSON object keys may differ between serde_json (Rust) and JSON.stringify (TS). Compare semantically.
What Must Be Identical
- All error messages (exact wording, Unicode markers)
- Exit codes for all error and success paths
- File paths (worktree locations, symlink targets, state file location)
- Git operations (same branches created/deleted, same merge behavior)
- Container commands (same
container execinvocations, same environment variables) - Flag parsing (
-f,--force,-p,--print,-n,--no-save,--json,-a,--all) - Default behavior (no args =
list) - Shell init output (
init fish,init bash,init zsh) -- these were already verified byte-for-byte identical - Fish/bash/zsh completions -- already verified byte-for-byte identical