Chris Wanstrath 92b64fcf3c rust rewrite

2026-04-10 11:13:00 -07:00

23 KiB

Raw Permalink Blame History

Sandlot Rust Rewrite: VM Integration Testing

This document describes how to test the Rust rewrite of sandlot against the TypeScript original. The goal is to verify identical behavior for every command that interacts with the VM/container, git worktrees, or session state.

Prerequisites

macOS on Apple Silicon
Apple Container installed (brew install container)
Rust toolchain (rustup)
Bun installed (brew install oven-sh/bun/bun)
An ANTHROPIC_API_KEY in ~/.env (format: ANTHROPIC_API_KEY=sk-ant-...)
A git repo to use as a test bed (create a throwaway one)

Setup

1. Build the Rust binary

cd rust-sandlot
cargo build --release

The binary is at ./rust-sandlot/target/release/sandlot.

2. Set up aliases

Use two distinct aliases so you can run either implementation:

alias sandlot-ts='bun run /path/to/rust-rewrite/src/cli.ts'
alias sandlot-rs='/path/to/rust-rewrite/rust-sandlot/target/release/sandlot'

3. Destroy any existing VM

Start from a clean slate. Both implementations share the same container name (sandlot), so only one can be tested at a time:

sandlot-ts vm destroy 2>/dev/null

4. Create a test repo

mkdir /tmp/sandlot-test-repo && cd /tmp/sandlot-test-repo
git init
echo "hello" > README.md
git add . && git commit -m "initial commit"

All tests below assume you run commands from inside this repo.

Testing methodology

For each test:

Run the command with sandlot-ts first, observe the result
Clean up / reset state
Run the same command with sandlot-rs, observe the result
Compare: stdout content, stderr content, exit code, and side effects (files created, git state, container state)

Some commands produce animated spinner output on stderr. The final line of spinner output is what matters (the success/failure message). Intermediate spinner frames are cosmetic and may differ in timing.

When comparing output, strip ANSI codes for semantic comparison:

sandlot-rs list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'
sandlot-ts list 2>&1 | sed 's/\x1b\[[0-9;]*m//g'

Phase 1: VM Lifecycle

These tests verify container management. Run them in order.

Test 1.1: `vm create`

sandlot-ts vm destroy 2>/dev/null  # clean slate
sandlot-rs vm create

Expect:

Spinner output on stderr progressing through: "Creating VM" -> "Pulling image & creating container" -> "Installing packages" -> "Installing Bun" -> "Installing Claude Code" -> "Installing neofetch" -> "Installing Neovim" -> "Configuring environment"
Final line: ✔ VM created
Exit code: 0

Verify side effects:

container list --format json --all  # should show "sandlot" container running
container exec sandlot which claude  # should print /home/ubuntu/.local/bin/claude
container exec sandlot which bun     # should print /home/ubuntu/.local/bin/bun
container exec sandlot which fish    # should print /usr/bin/fish
container exec sandlot test -f /home/ubuntu/.claude/settings.json && echo ok
container exec sandlot test -f /home/ubuntu/.claude/api-key-helper.sh && echo ok
container exec sandlot cat /home/ubuntu/.claude.json  # should have hasCompletedOnboarding: true

Now destroy and repeat with TS:

sandlot-rs vm destroy
sandlot-ts vm create

Verify the same side effects exist.

Test 1.2: `vm status`

# With VM running:
sandlot-rs vm status
sandlot-ts vm status

Expect (no sessions):

VM: running    (in green)

No active sessions.   (in dim)

# JSON mode:
sandlot-rs vm status --json
sandlot-ts vm status --json

Expect: JSON with "vm": "running" and "sessions": [].

Test 1.3: `vm stop`

sandlot-rs vm stop

Expect: Spinner, then ✔ VM stopped. Exit code 0.

sandlot-rs vm status

Expect: VM: stopped (in yellow).

Test 1.4: `vm start`

sandlot-rs vm start

Expect: ✔ VM started on stdout. Exit code 0.

Test 1.5: `vm info`

sandlot-rs vm info
sandlot-ts vm info

Expect: neofetch output (system info). Both should show identical container specs.

Test 1.6: `vm shell`

sandlot-rs vm shell

Expect: Drops into an interactive fish shell inside the container. Type exit to leave. Verify the prompt works and echo $PATH includes the expected paths.

Test 1.7: `vm destroy`

sandlot-rs vm destroy

Expect: Spinner, then ✔ VM destroyed. Exit code 0.

sandlot-rs vm status

Expect: VM: missing (in red).

Test 1.8: `vm create` (duplicate)

sandlot-rs vm create
# Then try again:
sandlot-rs vm create

Expect second call: Error: Container already exists. Use 'sandlot vm destroy' first to recreate it. Exit code 1.

Test 1.9: `vm uncache`

sandlot-rs vm uncache

Expect: ✔ Package cache cleared if cache existed, or No cache to clear.

Test 1.10: `vm start` when missing

sandlot-rs vm destroy
sandlot-rs vm start

Expect: Error: Container does not exist. Use 'sandlot vm create' first. Exit code 1.

Phase 2: Session Lifecycle

Ensure a VM is running before starting: sandlot-rs vm create (or ensure will auto-create).

Test 2.1: `new` with explicit branch name

sandlot-rs new test-branch-1
# Claude launches interactively. Press Ctrl+C or /exit to quit.

Expect:

Spinner: "Creating worktree" -> "Starting container" -> ✔ [test-branch-1] Session ready
Claude Code launches in the container
After exit, auto-save runs (spinner: "Staging changes" -> either "No changes to commit" or "Saved: ...")

Verify side effects:

ls -la ~/.sandlot/sandlot-test-repo/test-branch-1/  # worktree exists
ls -la .sandlot/test-branch-1                         # symlink exists
cat .sandlot/state.json                                # session entry exists
git worktree list                                      # shows the worktree

Test 2.2: `new` with no branch (random name)

sandlot-rs new

Expect: A random adjective-noun branch name is generated (e.g., calm-fern). The rest of the flow is identical to 2.1.

Test 2.3: `new` with prompt (spaces in "branch")

sandlot-rs new "fix the login bug on the settings page"

Expect: The text is treated as a prompt. A branch name is derived via Claude Haiku API (e.g., login-fix). If the API call fails, falls back to first two words (fix-the). The prompt is stored in state.json.

Test 2.4: `new` with `-p` (print mode)

sandlot-rs new -p "what is 2+2"

Expect:

Branch name derived from the prompt
Spinner: "Creating worktree" -> "Starting container" -> "Running prompt..."
Claude's response printed to stdout (rendered as markdown)
No interactive session
Auto-save runs after

Test 2.5: `new` duplicate session

sandlot-rs new test-branch-1

Expect: ✖ Session "test-branch-1" already exists. Use "sandlot open test-branch-1" to re-enter it. Exit code 1.

Test 2.6: `list` with sessions

sandlot-rs list

Expect:

  BRANCH          PROMPT
◯ test-branch-1
◯ other-branch    fix the login bug...

◯ idle · ◎ active · ◐ unsaved · ● saved · ⦿ review

Status icons use ANSI colors (dim for idle, cyan for active, yellow for dirty, green for saved, magenta for review).

sandlot-rs list --json

Expect: JSON array with each session having branch, worktree, created_at, prompt, in_review, status, repoRoot fields.

Test 2.7: `open` existing session

sandlot-rs open test-branch-1

Expect:

Spinner: "Starting container" -> ✔ [test-branch-1] Session ready
Claude launches with --continue (resumes prior conversation)
After exit, auto-save runs

Test 2.8: `open` with `--no-save`

sandlot-rs open test-branch-1 --no-save

Expect: Same as 2.7 but no auto-save after Claude exits.

Test 2.9: `open` nonexistent session but existing branch

If you manually create a branch and remove the session from state.json, open should recreate the session:

# Remove from state but keep the branch
cat .sandlot/state.json  # note the session
# Manually edit state.json to remove the session entry
sandlot-rs open test-branch-1

Expect: Worktree is recreated, session is re-added to state, Claude launches.

Test 2.10: `open` nonexistent branch

sandlot-rs open nonexistent-branch-xyz

Expect: ✖ No session or branch found for "nonexistent-branch-xyz". Exit code 1.

Phase 3: Branch Operations (read-only)

These commands read git state without modifying it. Create a session with some commits first:

sandlot-rs new branch-ops-test
# Inside Claude, make some changes and commit, then exit
# Or manually:
cd ~/.sandlot/sandlot-test-repo/branch-ops-test
echo "new file" > test.txt
git add . && git commit -m "add test file"
cd /tmp/sandlot-test-repo

Test 3.1: `diff`

sandlot-rs diff branch-ops-test

Expect:

If uncommitted changes in worktree: shows git diff HEAD
If clean: shows git diff main...branch-ops-test
Output piped through git's native diff display (with colors if terminal supports)

Compare with:

sandlot-ts diff branch-ops-test

Test 3.2: `log`

sandlot-rs log branch-ops-test

Expect:

If the session has a prompt, prints PROMPT: <text> to stderr first
Shows git log main..HEAD output with commit hashes highlighted in yellow
Piped through pager if output exceeds terminal height

Test 3.3: `show`

sandlot-rs show branch-ops-test

Expect:

Prints prompt to stderr (if stored)
Shows full git diff main...branch output on stdout

Test 3.4: `web`

sandlot-rs web branch-ops-test

Expect:

Generates /tmp/sandlot-branch-ops-test.html
Opens it in the default browser
HTML contains: branch name, prompt, commit log, diff stats, syntax-highlighted diff

Verify: Open the generated HTML file and compare it with the one generated by sandlot-ts web branch-ops-test.

Test 3.5: `dir`

sandlot-rs dir branch-ops-test

Expect: Prints the absolute worktree path to stdout, e.g., /Users/you/.sandlot/sandlot-test-repo/branch-ops-test.

Test 3.6: `dir` nonexistent session

sandlot-rs dir nonexistent

Expect: ✖ No session found for branch "nonexistent". Exit code 1.

Phase 4: Save, Merge, Squash, Rebase

Test 4.1: `save` with auto-generated message

# Make changes in the worktree first:
echo "change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test

Expect:

Spinner: [branch-ops-test] Staging changes -> Starting container -> Generating commit message -> Committing -> ✔ [branch-ops-test] Saved: <commit message>
The commit message is AI-generated from the diff

Test 4.2: `save` with explicit message

echo "another change" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs save branch-ops-test "manual commit message"

Expect:

Spinner: Staging changes -> Committing -> ✔ [branch-ops-test] Saved: manual commit message
No AI generation (no container startup needed for the message)

Test 4.3: `save` with no changes

sandlot-rs save branch-ops-test

Expect: ✖ [branch-ops-test] No changes to commit. Exit code 1.

Test 4.4: `squash`

# Ensure branch has multiple commits beyond main
sandlot-rs squash branch-ops-test

Expect:

Spinner: [branch-ops-test] Squashing -> Starting container -> Generating commit message -> ✔ [branch-ops-test] Squashed branch-ops-test into a single commit
git log main..HEAD in the worktree should show exactly 1 commit

Test 4.5: `squash` with no commits

sandlot-rs new fresh-branch
# Exit Claude immediately without making changes
sandlot-rs squash fresh-branch

Expect: ✖ Branch "fresh-branch" has no commits beyond main. Exit code 1.

Test 4.6: `squash` with dirty worktree

echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs squash branch-ops-test

Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.

Test 4.7: `rebase`

Set up a scenario where main has advanced:

# In the main repo, add a commit to main
cd /tmp/sandlot-test-repo
echo "main change" > main-file.txt
git add . && git commit -m "advance main"

sandlot-rs rebase branch-ops-test

Expect (clean rebase):

Spinner: [branch-ops-test] Fetching origin -> Rebasing onto origin/main -> ✔ [branch-ops-test] Rebased branch-ops-test onto main

Expect (with conflicts):

◆ Rebase conflicts in N file(s). Resolving with Claude...
Spinner: [branch-ops-test] Starting container -> (1/N) Resolving <file> (round 1) -> ✔ [branch-ops-test] Rebased branch-ops-test onto main (resolved N conflict round(s))

Test 4.8: `rebase` with dirty worktree

echo "dirty" >> ~/.sandlot/sandlot-test-repo/branch-ops-test/test.txt
sandlot-rs rebase branch-ops-test

Expect: ✖ Branch "branch-ops-test" has unsaved changes. Run "sandlot save branch-ops-test" first. Exit code 1.

Test 4.9: `merge`

cd /tmp/sandlot-test-repo
git checkout main
sandlot-rs merge branch-ops-test

Expect (clean merge):

Spinner: Merging branch-ops-test -> ✔ Merged branch-ops-test into main
Session is torn down (worktree removed, symlink removed, state cleared)
Local branch is deleted

Expect (with conflicts):

Spinner: Resolving N conflict(s) -> Starting container -> (1/N) Resolving <file> -> ✔ Resolved N conflict(s) and merged branch-ops-test
Same cleanup as clean merge

Test 4.10: `merge` not on main

git checkout -b other-branch
sandlot-rs merge some-branch

Expect: ✖ You must be on "main" to merge. Currently on "other-branch". Use --force to merge into "other-branch" anyway. Exit code 1.

Test 4.11: `merge --force` on non-main

sandlot-rs merge some-branch --force

Expect: Merge proceeds into other-branch instead of main.

Test 4.12: `merge` with dirty session

echo "dirty" >> ~/.sandlot/sandlot-test-repo/some-branch/file.txt
sandlot-rs merge some-branch

Expect: ✖ Branch "some-branch" has unsaved changes. Run "sandlot save some-branch" first. Exit code 1.

Phase 5: Review

Test 5.1: `review` interactive

sandlot-rs review branch-ops-test

Expect:

Spinner: [branch-ops-test] Starting container -> ✔ [branch-ops-test] Session ready
Claude launches with the review prompt (4-agent grumpy senior engineer review)
state.json shows in_review: true during the review
After exit: in_review is cleared, auto-save runs

Test 5.2: `review --print`

sandlot-rs review branch-ops-test --print

Expect:

Spinner: [branch-ops-test] Starting container -> Running review...
Review output printed to stdout (not interactive)
No auto-save after

Test 5.3: `review` with extra prompt

sandlot-rs review branch-ops-test "also check for SQL injection"

Expect: The extra text is appended to the review prompt. Claude receives both the standard review instructions and the additional context.

Phase 6: Shell and Edit

Test 6.1: `shell` with branch

sandlot-rs shell branch-ops-test

Expect: Interactive fish shell opens in the worktree directory inside the container. pwd should show the container-translated worktree path.

Test 6.2: `shell` without branch

sandlot-rs shell

Expect: Interactive fish shell opens at a default location (no --workdir flag).

Test 6.3: `edit`

export EDITOR=vim
sandlot-rs edit branch-ops-test test.txt

Expect: vim opens the file at the worktree path. After closing, exits cleanly.

Test 6.4: `edit` with missing EDITOR

unset EDITOR
sandlot-rs edit branch-ops-test test.txt

Expect: ✖ $EDITOR is not set. Exit code 1.

Test 6.5: `edit` with missing file

export EDITOR=vim
sandlot-rs edit branch-ops-test nonexistent.txt

Expect: ✖ File not found: nonexistent.txt Exit code 1.

Test 6.6: `edit` path escape attempt

sandlot-rs edit branch-ops-test ../../etc/passwd

Expect: Error (path escapes the worktree). The exact message may vary but should prevent access.

Phase 7: Close and Checkout

Test 7.1: `close` clean session

sandlot-rs close test-branch-1

Expect:

✔ Closed session test-branch-1 on stdout
Worktree removed from ~/.sandlot/...
Symlink removed from .sandlot/test-branch-1
Session removed from state.json
Local branch deleted
Exit code 0

Test 7.2: `close` dirty session

# Set up a dirty session first
sandlot-rs new dirty-test
echo "uncommitted" > ~/.sandlot/sandlot-test-repo/dirty-test/uncommitted.txt
sandlot-rs close dirty-test

Expect: ✖ Branch "dirty-test" has unsaved changes. Run "sandlot save dirty-test" first, or use -f to force. Exit code 1.

Test 7.3: `close --force` dirty session

sandlot-rs close dirty-test --force

Expect: ✔ Closed session dirty-test. Session is torn down despite uncommitted changes.

Test 7.4: `rm` alias

sandlot-rs rm some-branch

Expect: Identical to close. The rm command is a hidden alias.

Test 7.5: `close` nonexistent session

sandlot-rs close nonexistent-xyz

Expect: ✖ No session found for branch "nonexistent-xyz". Exit code 1.

Test 7.6: `checkout`

sandlot-rs new checkout-test
# Make a commit
echo "data" > ~/.sandlot/sandlot-test-repo/checkout-test/data.txt
cd ~/.sandlot/sandlot-test-repo/checkout-test && git add . && git commit -m "data"
cd /tmp/sandlot-test-repo
sandlot-rs checkout checkout-test

Expect:

✔ Checked out checkout-test
Session torn down (worktree, symlink, state removed)
git branch in main repo shows you're now on checkout-test
Branch is NOT deleted (unlike close and merge)

Test 7.7: `checkout` with dirty main worktree

echo "dirty" > /tmp/sandlot-test-repo/dirty.txt
sandlot-rs checkout some-branch

Expect: ✖ Working tree has uncommitted changes that may conflict with checkout. Commit or stash them first, or use -f to force. Exit code 1.

Test 7.8: `checkout --force` with dirty main worktree

sandlot-rs checkout some-branch --force

Expect: Proceeds despite dirty working tree.

Phase 8: Cleanup and Upgrade

Test 8.1: `cleanup` with stale sessions

# Create a session, then manually delete the worktree
sandlot-rs new stale-test
rm -rf ~/.sandlot/sandlot-test-repo/stale-test
sandlot-rs cleanup

Expect: ✔ Removed stale session: stale-test. Session removed from state.json.

Test 8.2: `cleanup` with no stale sessions

sandlot-rs cleanup

Expect: No stale sessions found. (or No sessions to clean up. if no sessions at all).

Test 8.3: `upgrade`

sandlot-rs upgrade

Expect: Attempts to upgrade sandlot. Compare behavior with sandlot-ts upgrade. Both should attempt the same upgrade mechanism.

Phase 9: List Status Resolution

This tests that list correctly resolves session status.

Test 9.1: Idle session

sandlot-rs new idle-test
# Exit Claude immediately, no changes
sandlot-rs list

Expect: idle-test shows ◯ (dim circle) = idle.

Test 9.2: Dirty session

echo "dirty" > ~/.sandlot/sandlot-test-repo/idle-test/dirty.txt
sandlot-rs list

Expect: idle-test shows ◐ (yellow half-circle) = unsaved.

Test 9.3: Saved session

cd ~/.sandlot/sandlot-test-repo/idle-test
git add . && git commit -m "save"
cd /tmp/sandlot-test-repo
sandlot-rs list

Expect: idle-test shows ● (green circle) = saved.

Test 9.4: `list --all`

sandlot-rs list --all

Expect: Sessions grouped by repo name with headers:

── repo-name ──
  BRANCH  PROMPT
◯ branch  prompt text

Test 9.5: `list` with no sessions

# Close all sessions first
sandlot-rs list

Expect: ◆ No active sessions.

Test 9.6: `list` with VM down

sandlot-rs vm stop
sandlot-rs list

Expect: Normal session list (all show as idle since VM can't check status), plus:

VM is not running.    (in red)

Phase 10: End-to-End Comparison

For each command tested above, run the same scenario with both sandlot-ts and sandlot-rs and compare:

Exit codes must be identical
Stdout content must be semantically identical (exact match after stripping ANSI if formatting differs)
Stderr content must match (error messages, spinner final lines)
Side effects must match:
- Same files created/deleted
- Same git state (branches, worktrees, commits)
- Same state.json content (modulo timestamps)
- Same container state

Comparison script

#!/bin/bash
# Compare a command between TS and Rust
CMD="$@"
echo "=== TypeScript ==="
sandlot-ts $CMD 2>/tmp/ts-stderr; TS_EXIT=$?
echo "EXIT: $TS_EXIT"
cat /tmp/ts-stderr

echo ""
echo "=== Rust ==="
sandlot-rs $CMD 2>/tmp/rs-stderr; RS_EXIT=$?
echo "EXIT: $RS_EXIT"
cat /tmp/rs-stderr

echo ""
if [ "$TS_EXIT" = "$RS_EXIT" ]; then
  echo "EXIT CODES: MATCH ($TS_EXIT)"
else
  echo "EXIT CODES: MISMATCH (ts=$TS_EXIT rs=$RS_EXIT)"
fi

Known Differences to Accept

Timestamps in state.json will differ between runs (different created_at values). Compare structure and non-timestamp fields only.
Spinner frame timing may differ slightly. Only compare the final spinner message.
AI-generated content (branch names from prompts, commit messages, conflict resolutions, reviews) will differ between runs since they involve LLM calls. Verify the format is correct, not the exact text.
Random branch names from sandlot new (no args) will differ. Verify the format is adjective-noun from the same word lists.
Order of JSON object keys may differ between serde_json (Rust) and JSON.stringify (TS). Compare semantically.

What Must Be Identical

All error messages (exact wording, Unicode markers)
Exit codes for all error and success paths
File paths (worktree locations, symlink targets, state file location)
Git operations (same branches created/deleted, same merge behavior)
Container commands (same container exec invocations, same environment variables)
Flag parsing (-f, --force, -p, --print, -n, --no-save, --json, -a, --all)
Default behavior (no args = list)
Shell init output (init fish, init bash, init zsh) -- these were already verified byte-for-byte identical
Fish/bash/zsh completions -- already verified byte-for-byte identical

23 KiB Raw Permalink Blame History

Sandlot Rust Rewrite: VM Integration Testing

Prerequisites

Setup

1. Build the Rust binary

2. Set up aliases

3. Destroy any existing VM

4. Create a test repo

Testing methodology

Phase 1: VM Lifecycle

Test 1.1: vm create

Test 1.2: vm status

Test 1.3: vm stop

Test 1.4: vm start

Test 1.5: vm info

Test 1.6: vm shell

Test 1.7: vm destroy

Test 1.8: vm create (duplicate)

Test 1.9: vm uncache

Test 1.10: vm start when missing

Phase 2: Session Lifecycle

Test 2.1: new with explicit branch name

Test 2.2: new with no branch (random name)

Test 2.3: new with prompt (spaces in "branch")

Test 2.4: new with -p (print mode)

Test 2.5: new duplicate session

Test 2.6: list with sessions

Test 2.7: open existing session

Test 2.8: open with --no-save

Test 2.9: open nonexistent session but existing branch

Test 2.10: open nonexistent branch

Phase 3: Branch Operations (read-only)

Test 3.1: diff

Test 3.2: log

Test 3.3: show

Test 3.4: web

Test 3.5: dir

Test 3.6: dir nonexistent session

Phase 4: Save, Merge, Squash, Rebase

Test 4.1: save with auto-generated message

Test 4.2: save with explicit message

Test 4.3: save with no changes

Test 4.4: squash

Test 4.5: squash with no commits

Test 4.6: squash with dirty worktree

Test 4.7: rebase

Test 4.8: rebase with dirty worktree

Test 4.9: merge

Test 4.10: merge not on main

Test 4.11: merge --force on non-main

Test 4.12: merge with dirty session

Phase 5: Review

Test 5.1: review interactive

Test 5.2: review --print

Test 5.3: review with extra prompt

Phase 6: Shell and Edit

Test 6.1: shell with branch

Test 6.2: shell without branch

Test 6.3: edit

Test 6.4: edit with missing EDITOR

Test 6.5: edit with missing file

Test 6.6: edit path escape attempt

Phase 7: Close and Checkout

Test 7.1: close clean session

Test 7.2: close dirty session

Test 7.3: close --force dirty session

Test 7.4: rm alias

Test 7.5: close nonexistent session

Test 7.6: checkout

Test 7.7: checkout with dirty main worktree

Test 7.8: checkout --force with dirty main worktree

Phase 8: Cleanup and Upgrade

Test 8.1: cleanup with stale sessions

Test 8.2: cleanup with no stale sessions

Test 8.3: upgrade

Phase 9: List Status Resolution

Test 9.1: Idle session

Test 9.2: Dirty session

Test 9.3: Saved session

Test 9.4: list --all

23 KiB

Raw Permalink Blame History

Test 1.1: `vm create`

Test 1.2: `vm status`

Test 1.3: `vm stop`

Test 1.4: `vm start`

Test 1.5: `vm info`

Test 1.6: `vm shell`

Test 1.7: `vm destroy`

Test 1.8: `vm create` (duplicate)

Test 1.9: `vm uncache`

Test 1.10: `vm start` when missing

Test 2.1: `new` with explicit branch name

Test 2.2: `new` with no branch (random name)

Test 2.3: `new` with prompt (spaces in "branch")

Test 2.4: `new` with `-p` (print mode)

Test 2.5: `new` duplicate session

Test 2.6: `list` with sessions

Test 2.7: `open` existing session

Test 2.8: `open` with `--no-save`

Test 2.9: `open` nonexistent session but existing branch

Test 2.10: `open` nonexistent branch

Test 3.1: `diff`

Test 3.2: `log`

Test 3.3: `show`

Test 3.4: `web`

Test 3.5: `dir`

Test 3.6: `dir` nonexistent session

Test 4.1: `save` with auto-generated message

Test 4.2: `save` with explicit message

Test 4.3: `save` with no changes

Test 4.4: `squash`

Test 4.5: `squash` with no commits

Test 4.6: `squash` with dirty worktree

Test 4.7: `rebase`

Test 4.8: `rebase` with dirty worktree

Test 4.9: `merge`

Test 4.10: `merge` not on main

Test 4.11: `merge --force` on non-main

Test 4.12: `merge` with dirty session

Test 5.1: `review` interactive

Test 5.2: `review --print`

Test 5.3: `review` with extra prompt

Test 6.1: `shell` with branch

Test 6.2: `shell` without branch

Test 6.3: `edit`

Test 6.4: `edit` with missing EDITOR

Test 6.5: `edit` with missing file

Test 6.6: `edit` path escape attempt

Test 7.1: `close` clean session

Test 7.2: `close` dirty session

Test 7.3: `close --force` dirty session

Test 7.4: `rm` alias

Test 7.5: `close` nonexistent session

Test 7.6: `checkout`

Test 7.7: `checkout` with dirty main worktree

Test 7.8: `checkout --force` with dirty main worktree

Test 8.1: `cleanup` with stale sessions

Test 8.2: `cleanup` with no stale sessions

Test 8.3: `upgrade`

Test 9.4: `list --all`

Test 9.5: `list` with no sessions

Test 9.6: `list` with VM down