forked from probablycorey/baudy
Add README with project context and link from CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
8e3179f7d3
commit
9c20ce7d9c
|
|
@ -1,3 +1,6 @@
|
||||||
|
See [README.md](README.md) for project context, goals, and architecture.
|
||||||
|
See [docs/ggwave-gotchas.md](docs/ggwave-gotchas.md) for platform-specific pitfalls.
|
||||||
|
|
||||||
# Toes - Guide to Writing Apps
|
# Toes - Guide to Writing Apps
|
||||||
|
|
||||||
Toes manages and runs web apps, each on its own port.
|
Toes manages and runs web apps, each on its own port.
|
||||||
|
|
|
||||||
37
README.md
Normal file
37
README.md
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
# ggwave Audio POC
|
||||||
|
|
||||||
|
Proof-of-concept for data-over-sound communication using the [ggwave](https://github.com/ggerganov/ggwave) library. This validates the browser-to-server audio pipeline that will eventually be used for WiFi provisioning on Raspberry Pi (Toes devices).
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Toes devices (Raspberry Pis) need a way to receive WiFi credentials during initial setup. The device has no network connection yet, so we can't use HTTP. Instead, the user's phone encodes the credentials as audio chirps and plays them through the speaker. The Pi's microphone picks up the chirps and decodes them. No Bluetooth pairing, no QR codes, no special hardware — just sound.
|
||||||
|
|
||||||
|
## What this POC does
|
||||||
|
|
||||||
|
It's a calculator. The phone sends math expressions as audio, the server decodes them and sends back the answer. This is a minimal end-to-end test of the full pipeline:
|
||||||
|
|
||||||
|
1. **Phone (browser)** — calculator UI. User types `78*5` and hits `=`. The expression is encoded as an audible chirp using ggwave's AUDIBLE_FAST protocol and played through the phone speaker.
|
||||||
|
2. **Server (Bun)** — listens on the microphone via `sox`, feeds audio frames to ggwave for decoding. When it decodes an expression, it evaluates it and sends the result back via SSE.
|
||||||
|
3. **Phone receives result** — the answer (`390`) appears on the calculator display.
|
||||||
|
|
||||||
|
The server also chirps the result back through the speakers (half-duplex — it stops listening while playing to avoid feedback).
|
||||||
|
|
||||||
|
## How to run
|
||||||
|
|
||||||
|
```
|
||||||
|
cd tmp
|
||||||
|
bun install
|
||||||
|
bun run server.ts
|
||||||
|
```
|
||||||
|
|
||||||
|
Open `http://<hostname>:8888` on your phone. Make sure the server machine's default audio input is a working microphone (check System Settings > Sound > Input on macOS).
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
- **ggwave** handles encoding/decoding using multi-frequency FSK modulation with Reed-Solomon error correction. The AUDIBLE_FAST protocol uses audible frequencies (~1-6kHz range).
|
||||||
|
- **Browser side** uses WebAudio API to play encoded waveforms. ggwave runs as WASM.
|
||||||
|
- **Server side** uses `sox -d` to capture mic audio as raw 48kHz float32 samples, then feeds frames to ggwave for decoding.
|
||||||
|
- **Half-duplex** — both sides use the same frequency band, so only one can transmit at a time. The server stops processing mic input while playing back results.
|
||||||
|
- **SSE** is used as a reliable fallback channel to push results to the phone (vs trying to decode audio on the phone in a noisy environment).
|
||||||
|
|
||||||
|
See [docs/ggwave-gotchas.md](docs/ggwave-gotchas.md) for hard-won lessons about iOS audio, macOS mic permissions, WASM heap management, and sample rate matching.
|
||||||
Loading…
Reference in New Issue
Block a user