sandlot/rust-sandlot/TESTING.md
2026-04-10 11:13:00 -07:00

23 KiB

Sandlot Rust Rewrite: VM Integration Testing

This document describes how to test the Rust rewrite of sandlot against the TypeScript original. The goal is to verify identical behavior for every command that interacts with the VM/container, git worktrees, or session state.

Prerequisites

  • macOS on Apple Silicon
  • Apple Container installed (brew install container)
  • Rust toolchain (rustup)
  • Bun installed (brew install oven-sh/bun/bun)
  • An ANTHROPIC_API_KEY in ~/.env (format: ANTHROPIC_API_KEY=sk-ant-...)
  • A git repo to use as a test bed (create a throwaway one)

Setup

1. Build the Rust binary

cd rust-sandlot
cargo build --release

The binary is at ./rust-sandlot/target/release/sandlot.

2. Set up aliases

Use two distinct aliases so you can run either implementation:

alias sandlot-ts='bun run /path/to/rust-rewrite/src/cli.ts'
alias sandlot-rs='/path/to/rust-rewrite/rust-sandlot/target/release/sandlot'

3. Destroy any existing VM

Start from a clean slate. Both implementations share the same container name (sandlot), so only one can be tested at a time:

sandlot-ts vm destroy 2>/dev/null

4. Create a test repo

mkdir /tmp/sandlot-test-repo && cd /tmp/sandlot-test-repo
git init
echo "hello" > README.md
git add . && git commit -m "initial commit"

All tests below assume you run commands from inside this repo.


Testing methodology

For each test:

  1. Run the command with sandlot-ts first, observe the result
  2. Clean up / reset state
  3. Run the same command with sandlot-rs, observe the result
  4. Compare: stdout content, stderr content, exit code, and side effects (files created, git state, container state)

Some commands produce animated spinner output on stderr. The final line of spinner output is what matters (the success/failure message). Intermediate spinner frames are cosmetic and may differ in timing.

When comparing output, strip ANSI codes for semantic comparison:

sandlot-rs list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
sandlot-ts list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'

Phase 1: VM Lifecycle

These tests verify container management. Run them in order.

Test 1.1: vm create

sandlot-ts vm destroy 2>/dev/null  # clean slate
sandlot-rs vm create

Expect:

  • Spinner output on stderr progressing through: "Creating VM" -> "Pulling image & creating container" -> "Installing packages" -> "Installing Bun" -> "Installing Claude Code" -> "Installing neofetch" -> "Installing Neovim" -> "Configuring environment"
  • Final line: ✔ VM created
  • Exit code: 0

Verify side effects:

container list --format json --all  # should show "sandlot" container running
container exec sandlot which claude  # should print /home/ubuntu/.local/bin/claude
container exec sandlot which bun     # should print /home/ubuntu/.local/bin/bun
container exec sandlot which fish    # should print /usr/bin/fish
container exec sandlot test -f /home/ubuntu/.claude/settings.json && echo ok
container exec sandlot test -f /home/ubuntu/.claude/api-key-helper.sh && echo ok
container exec sandlot cat /home/ubuntu/.claude.json  # should have hasCompletedOnboarding: true

Now destroy and repeat with TS:

sandlot-rs vm destroy
sandlot-ts vm create

Verify the same side effects exist.

Test 1.2: vm status

# With VM running:
sandlot-rs vm status
sandlot-ts vm status

Expect (no sessions):

VM: running    (in green)

No active sessions.   (in dim)
# JSON mode:
sandlot-rs vm status --json
sandlot-ts vm status --json

Expect: JSON with "vm": "running" and "sessions": [].

Test 1.3: vm stop

sandlot-rs vm stop

Expect: Spinner, then ✔ VM stopped. Exit code 0.

sandlot-rs vm status

Expect: VM: stopped (in yellow).

Test 1.4: vm start

sandlot-rs vm start

Expect: ✔ VM started on stdout. Exit code 0.

Test 1.5: vm info

sandlot-rs vm info
sandlot-ts vm info

Expect: neofetch output (system info). Both should show identical container specs.

Test 1.6: vm shell

sandlot-rs vm shell

Expect: Drops into an interactive fish shell inside the container. Type exit to leave. Verify the prompt works and echo $PATH includes the expected paths.

Test 1.7: vm destroy

sandlot-rs vm destroy

Expect: Spinner, then ✔ VM destroyed. Exit code 0.

sandlot-rs vm status

Expect: VM: missing (in red).

Test 1.8: vm create (duplicate)

sandlot-rs vm create
# Then try again:
sandlot-rs vm create

Expect second call: Error: Container already exists. Use 'sandlot vm destroy' first to recreate it. Exit code 1.

Test 1.9: vm uncache

sandlot-rs vm uncache

Expect: ✔ Package cache cleared if cache existed, or No cache to clear.

Test 1.10: vm start when missing

sandlot-rs vm destroy
sandlot-rs vm start

Expect: Error: Container does not exist. Use 'sandlot vm create' first. Exit code 1.


Phase 2: Session Lifecycle

Ensure a VM is running before starting: sandlot-rs vm create (or ensure will auto-create).

Test 2.1: new with explicit branch name

sandlot-rs new test-branch-1
# Claude launches interactively. Press Ctrl+C or /exit to quit.

Expect:

  • Spinner: "Creating worktree" -> "Starting container" -> ✔ [test-branch-1] Session ready
  • Claude Code launches in the container
  • After exit, auto-save runs (spinner: "Staging changes" -> either "No changes to commit" or "Saved: ...")

Verify side effects:

ls -la ~/.sandlot/sandlot-test-repo/test-branch-1/  # worktree exists
ls -la .sandlot/test-branch-1                         # symlink exists
cat .sandlot/state.json                                # session entry exists
git worktree list                                      # shows the worktree

Test 2.2: new with no branch (random name)

sandlot-rs new

Expect: A random adjective-noun branch name is generated (e.g., calm-fern). The rest of the flow is identical to 2.1.

Test 2.3: new with prompt (spaces in "branch")

sandlot-rs new "fix the login bug on the settings page"

Expect: The text is treated as a prompt. A branch name is derived via Claude Haiku API (e.g., login-fix). If the API call fails, falls back to first two words (fix-the). The prompt is stored in state.json.

Test 2.4: new with -p (print mode)

sandlot-rs new -p "what is 2+2"

Expect:

  • Branch name derived from the prompt
  • Spinner: "Creating worktree" -> "Starting container" -> "Running prompt..."
  • Claude's response printed to stdout (rendered as markdown)
  • No interactive session
  • Auto-save runs after

Test 2.5: new duplicate session

sandlot-rs new test-branch-1

Expect: ✖ Session "test-branch-1" already exists. Use "sandlot open test-branch-1" to re-enter it. Exit code 1.

Test 2.6: list with sessions

sandlot-rs list

Expect:

  BRANCH          PROMPT
◯ test-branch-1
◯ other-branch    fix the login bug...

◯ idle · ◎ active · ◐ unsaved · ● saved · ⦿ review

Status icons use ANSI colors (dim for idle, cyan for active, yellow for dirty, green for saved, magenta for review).

sandlot-rs list --json

Expect: JSON array with each session having branch, worktree, created_at, prompt, in_review, status, repoRoot fields.

Test 2.7: open existing session

sandlot-rs open test-branch-1

Expect:

  • Spinner: "Starting container" -> ✔ [test-branch-1] Session ready
  • Claude launches with --continue (resumes prior conversation)
  • After exit, auto-save runs

Test 2.8: open with --no-save

sandlot-rs open test-branch-1 --no-save

Expect: Same as 2.7 but no auto-save after Claude exits.

Test 2.9: open nonexistent session but existing branch

If you manually create a branch and remove the session from state.json, open should recreate the session:

# Remove from state but keep the branch
cat .sandlot/state.json  # note the session
# Manually edit state.json to remove the session entry
sandlot-rs open test-branch-1

Expect: Worktree is recreated, session is re-added to state, Claude launches.

Test 2.10: open nonexistent branch

sandlot-rs open nonexistent-branch-xyz

Expect: ✖ No session or branch found for "nonexistent-branch-xyz". Exit code 1.


Phase 3: Branch Operations (read-only)

These commands read git state without modifying it. Create a session with some commits first:

sandlot-rs new branch-ops-test
# Inside Claude, make some changes and commit, then exit
# Or manually:
cd ~/.sandlot/sandlot-test-repo/branch-ops-test
echo "new file" > test.txt
git add . && git commit -m "add test file"
cd /tmp/sandlot-test-repo

Test 3.1: diff

sandlot-rs diff branch-ops-test

Expect:

  • If uncommitted changes in worktree: shows git diff HEAD
  • If clean: shows git diff main...branch-ops-test
  • Output piped through git's native diff display (with colors if terminal supports)

Compare with:

sandlot-ts diff branch-ops-test

Test 3.2: log

sandlot-rs log branch-ops-test

Expect:

  • If the session has a prompt, prints PROMPT: <text> to stderr first
  • Shows git log main..HEAD output with commit hashes highlighted in yellow
  • Piped through pager if output exceeds terminal height

Test 3.3: show

sandlot-rs show branch-ops-test

Expect:

  • Prints prompt to stderr (if stored)
  • Shows full git diff main...branch output on stdout

Test 3.4: web

sandlot-rs web branch-ops-test

Expect:

  • Generates /tmp/sandlot-branch-ops-test.html
  • Opens it in the default browser
  • HTML contains: branch name, prompt, commit log, diff stats, syntax-highlighted diff

Verify: Open the generated HTML file and compare it with the one generated by sandlot-ts web branch-ops-test.

Test 3.5: dir

sandlot-rs dir branch-ops-test

Expect: Prints the absolute worktree path to stdout, e.g., /Users/you/.sandlot/sandlot-test-repo/branch-ops-test.

Test 3.6: dir nonexistent session

sandlot-rs dir nonexistent

Expect: ✖ No session found for branch "nonexistent". Exit code 1.


Phase 4: Save, Merge, Squash, Rebase

Test 4.1: save with auto-generated message

# Make changes in the worktree first:
echo "change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test

Expect:

  • Spinner: [branch-ops-test] Staging changes -> Starting container -> Generating commit message -> Committing -> ✔ [branch-ops-test] Saved: <commit message>
  • The commit message is AI-generated from the diff

Test 4.2: save with explicit message

echo "another change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test "manual commit message"

Expect:

  • Spinner: Staging changes -> Committing -> ✔ [branch-ops-test] Saved: manual commit message
  • No AI generation (no container startup needed for the message)

Test 4.3: save with no changes

sandlot-rs save branch-ops-test

Expect: ✖ [branch-ops-test] No changes to commit. Exit code 1.

Test 4.4: squash

# Ensure branch has multiple commits beyond main
sandlot-rs squash branch-ops-test

Expect:

  • Spinner: [branch-ops-test] Squashing -> Starting container -> Generating commit message -> ✔ [branch-ops-test] Squashed branch-ops-test into a single commit
  • git log main..HEAD in the worktree should show exactly 1 commit

Test 4.5: squash with no commits

sandlot-rs new fresh-branch
# Exit Claude immediately without making changes
sandlot-rs squash fresh-branch

Expect: ✖ Branch "fresh-branch" has no commits beyond main. Exit code 1.

Test 4.6: squash with dirty worktree

echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs squash branch-ops-test

Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.

Test 4.7: rebase

Set up a scenario where main has advanced:

# In the main repo, add a commit to main
cd /tmp/sandlot-test-repo
echo "main change" > main-file.txt
git add . && git commit -m "advance main"

sandlot-rs rebase branch-ops-test

Expect (clean rebase):

  • Spinner: [branch-ops-test] Fetching origin -> Rebasing onto origin/main -> ✔ [branch-ops-test] Rebased branch-ops-test onto main

Expect (with conflicts):

  • ◆ Rebase conflicts in N file(s). Resolving with Claude...
  • Spinner: [branch-ops-test] Starting container -> (1/N) Resolving <file> (round 1) -> ✔ [branch-ops-test] Rebased branch-ops-test onto main (resolved N conflict round(s))

Test 4.8: rebase with dirty worktree

echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs rebase branch-ops-test

Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.

Test 4.9: merge

cd /tmp/sandlot-test-repo
git checkout main
sandlot-rs merge branch-ops-test

Expect (clean merge):

  • Spinner: Merging branch-ops-test -> ✔ Merged branch-ops-test into main
  • Session is torn down (worktree removed, symlink removed, state cleared)
  • Local branch is deleted

Expect (with conflicts):

  • Spinner: Resolving N conflict(s) -> Starting container -> (1/N) Resolving <file> -> ✔ Resolved N conflict(s) and merged branch-ops-test
  • Same cleanup as clean merge

Test 4.10: merge not on main

git checkout -b other-branch
sandlot-rs merge some-branch

Expect: ✖ You must be on "main" to merge. Currently on "other-branch". Use --force to merge into "other-branch" anyway. Exit code 1.

Test 4.11: merge --force on non-main

sandlot-rs merge some-branch --force

Expect: Merge proceeds into other-branch instead of main.

Test 4.12: merge with dirty session

echo "dirty" >> ~/.sandlot/sandlot-test-repo/some-branch/file.txt
sandlot-rs merge some-branch

Expect: ✖ Branch "some-branch" has unsaved changes. Run "sandlot save some-branch" first. Exit code 1.


Phase 5: Review

Test 5.1: review interactive

sandlot-rs review branch-ops-test

Expect:

  • Spinner: [branch-ops-test] Starting container -> ✔ [branch-ops-test] Session ready
  • Claude launches with the review prompt (4-agent grumpy senior engineer review)
  • state.json shows in_review: true during the review
  • After exit: in_review is cleared, auto-save runs

Test 5.2: review --print

sandlot-rs review branch-ops-test --print

Expect:

  • Spinner: [branch-ops-test] Starting container -> Running review...
  • Review output printed to stdout (not interactive)
  • No auto-save after

Test 5.3: review with extra prompt

sandlot-rs review branch-ops-test "also check for SQL injection"

Expect: The extra text is appended to the review prompt. Claude receives both the standard review instructions and the additional context.


Phase 6: Shell and Edit

Test 6.1: shell with branch

sandlot-rs shell branch-ops-test

Expect: Interactive fish shell opens in the worktree directory inside the container. pwd should show the container-translated worktree path.

Test 6.2: shell without branch

sandlot-rs shell

Expect: Interactive fish shell opens at a default location (no --workdir flag).

Test 6.3: edit

export EDITOR=vim
sandlot-rs edit branch-ops-test test.txt

Expect: vim opens the file at the worktree path. After closing, exits cleanly.

Test 6.4: edit with missing EDITOR

unset EDITOR
sandlot-rs edit branch-ops-test test.txt

Expect: ✖ $EDITOR is not set. Exit code 1.

Test 6.5: edit with missing file

export EDITOR=vim
sandlot-rs edit branch-ops-test nonexistent.txt

Expect: ✖ File not found: nonexistent.txt Exit code 1.

Test 6.6: edit path escape attempt

sandlot-rs edit branch-ops-test ../../etc/passwd

Expect: Error (path escapes the worktree). The exact message may vary but should prevent access.


Phase 7: Close and Checkout

Test 7.1: close clean session

sandlot-rs close test-branch-1

Expect:

  • ✔ Closed session test-branch-1 on stdout
  • Worktree removed from ~/.sandlot/...
  • Symlink removed from .sandlot/test-branch-1
  • Session removed from state.json
  • Local branch deleted
  • Exit code 0

Test 7.2: close dirty session

# Set up a dirty session first
sandlot-rs new dirty-test
echo "uncommitted" > ~/.sandlot/sandlot-test-repo/dirty-test/uncommitted.txt
sandlot-rs close dirty-test

Expect: ✖ Branch "dirty-test" has unsaved changes. Run "sandlot save dirty-test" first, or use -f to force. Exit code 1.

Test 7.3: close --force dirty session

sandlot-rs close dirty-test --force

Expect: ✔ Closed session dirty-test. Session is torn down despite uncommitted changes.

Test 7.4: rm alias

sandlot-rs rm some-branch

Expect: Identical to close. The rm command is a hidden alias.

Test 7.5: close nonexistent session

sandlot-rs close nonexistent-xyz

Expect: ✖ No session found for branch "nonexistent-xyz". Exit code 1.

Test 7.6: checkout

sandlot-rs new checkout-test
# Make a commit
echo "data" > ~/.sandlot/sandlot-test-repo/checkout-test/data.txt
cd ~/.sandlot/sandlot-test-repo/checkout-test && git add . && git commit -m "data"
cd /tmp/sandlot-test-repo
sandlot-rs checkout checkout-test

Expect:

  • ✔ Checked out checkout-test
  • Session torn down (worktree, symlink, state removed)
  • git branch in main repo shows you're now on checkout-test
  • Branch is NOT deleted (unlike close and merge)

Test 7.7: checkout with dirty main worktree

echo "dirty" > /tmp/sandlot-test-repo/dirty.txt
sandlot-rs checkout some-branch

Expect: ✖ Working tree has uncommitted changes that may conflict with checkout. Commit or stash them first, or use -f to force. Exit code 1.

Test 7.8: checkout --force with dirty main worktree

sandlot-rs checkout some-branch --force

Expect: Proceeds despite dirty working tree.


Phase 8: Cleanup and Upgrade

Test 8.1: cleanup with stale sessions

# Create a session, then manually delete the worktree
sandlot-rs new stale-test
rm -rf ~/.sandlot/sandlot-test-repo/stale-test
sandlot-rs cleanup

Expect: ✔ Removed stale session: stale-test. Session removed from state.json.

Test 8.2: cleanup with no stale sessions

sandlot-rs cleanup

Expect: No stale sessions found. (or No sessions to clean up. if no sessions at all).

Test 8.3: upgrade

sandlot-rs upgrade

Expect: Attempts to upgrade sandlot. Compare behavior with sandlot-ts upgrade. Both should attempt the same upgrade mechanism.


Phase 9: List Status Resolution

This tests that list correctly resolves session status.

Test 9.1: Idle session

sandlot-rs new idle-test
# Exit Claude immediately, no changes
sandlot-rs list

Expect: idle-test shows (dim circle) = idle.

Test 9.2: Dirty session

echo "dirty" > ~/.sandlot/sandlot-test-repo/idle-test/dirty.txt
sandlot-rs list

Expect: idle-test shows (yellow half-circle) = unsaved.

Test 9.3: Saved session

cd ~/.sandlot/sandlot-test-repo/idle-test
git add . && git commit -m "save"
cd /tmp/sandlot-test-repo
sandlot-rs list

Expect: idle-test shows (green circle) = saved.

Test 9.4: list --all

sandlot-rs list --all

Expect: Sessions grouped by repo name with headers:

── repo-name ──
  BRANCH  PROMPT
◯ branch  prompt text

Test 9.5: list with no sessions

# Close all sessions first
sandlot-rs list

Expect: ◆ No active sessions.

Test 9.6: list with VM down

sandlot-rs vm stop
sandlot-rs list

Expect: Normal session list (all show as idle since VM can't check status), plus:

VM is not running.    (in red)

Phase 10: End-to-End Comparison

For each command tested above, run the same scenario with both sandlot-ts and sandlot-rs and compare:

  1. Exit codes must be identical
  2. Stdout content must be semantically identical (exact match after stripping ANSI if formatting differs)
  3. Stderr content must match (error messages, spinner final lines)
  4. Side effects must match:
    • Same files created/deleted
    • Same git state (branches, worktrees, commits)
    • Same state.json content (modulo timestamps)
    • Same container state

Comparison script

#!/bin/bash
# Compare a command between TS and Rust
CMD="$@"
echo "=== TypeScript ==="
sandlot-ts $CMD 2>/tmp/ts-stderr; TS_EXIT=$?
echo "EXIT: $TS_EXIT"
cat /tmp/ts-stderr

echo ""
echo "=== Rust ==="
sandlot-rs $CMD 2>/tmp/rs-stderr; RS_EXIT=$?
echo "EXIT: $RS_EXIT"
cat /tmp/rs-stderr

echo ""
if [ "$TS_EXIT" = "$RS_EXIT" ]; then
  echo "EXIT CODES: MATCH ($TS_EXIT)"
else
  echo "EXIT CODES: MISMATCH (ts=$TS_EXIT rs=$RS_EXIT)"
fi

Known Differences to Accept

  • Timestamps in state.json will differ between runs (different created_at values). Compare structure and non-timestamp fields only.
  • Spinner frame timing may differ slightly. Only compare the final spinner message.
  • AI-generated content (branch names from prompts, commit messages, conflict resolutions, reviews) will differ between runs since they involve LLM calls. Verify the format is correct, not the exact text.
  • Random branch names from sandlot new (no args) will differ. Verify the format is adjective-noun from the same word lists.
  • Order of JSON object keys may differ between serde_json (Rust) and JSON.stringify (TS). Compare semantically.

What Must Be Identical

  • All error messages (exact wording, Unicode markers)
  • Exit codes for all error and success paths
  • File paths (worktree locations, symlink targets, state file location)
  • Git operations (same branches created/deleted, same merge behavior)
  • Container commands (same container exec invocations, same environment variables)
  • Flag parsing (-f, --force, -p, --print, -n, --no-save, --json, -a, --all)
  • Default behavior (no args = list)
  • Shell init output (init fish, init bash, init zsh) -- these were already verified byte-for-byte identical
  • Fish/bash/zsh completions -- already verified byte-for-byte identical