Eddie Hudson commited on
Commit
6af0b1a
·
0 Parent(s):

Initial commit

Browse files
.env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ CLAWDBOT_TOKEN="PUT YOUR CLAWDBOT TOKEN HERE"
2
+ ELEVENLABS_API_KEY="PUT YOUR ELEVENLABS API KEY HERE"
.gitignore ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ *.egg-info/
3
+ *.code-workspace
4
+ .env
5
+ build/
6
+ dist/
7
+ __pycache__/
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.12
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Moltbot Body
3
+ emoji: 🤖
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: static
7
+ pinned: false
8
+ short_description: Give Moltbot a physical presence with Reachy Mini
9
+ tags:
10
+ - reachy_mini
11
+ - reachy_mini_python_app
12
+ - clawdbot
13
+ - moltbot
14
+ ---
15
+
16
+ # Moltbot's Body
17
+
18
+ > **Security Warning**: This project uses Moltbot, which runs AI-generated code with access to your system. Ensure you understand the security implications before installation. Only run Moltbot from trusted sources and review its permissions carefully. See the [Moltbot Security documentation](https://docs.molt.bot/gateway/security) for details.
19
+
20
+ Reachy Mini integration with Moltbot — giving Moltbot a physical presence.
21
+
22
+ ## What is Moltbot?
23
+
24
+ [Moltbot](https://docs.molt.bot/start/getting-started) is an AI assistant platform that can connect to various chat surfaces (WhatsApp, Telegram, Discord, etc.) and execute tasks autonomously. This project extends Moltbot by giving it a physical robot body using [Reachy Mini](https://huggingface.co/spaces/pollen-robotics/Reachy_Mini), a small expressive robot from Pollen Robotics.
25
+
26
+ With this integration, Moltbot can:
27
+ - Listen to speech via the robot's microphone
28
+ - Transcribe speech locally using Whisper
29
+ - Generate responses through the Moltbot gateway
30
+ - Speak responses through ElevenLabs TTS
31
+ - Move its head expressively while speaking
32
+
33
+ ## Architecture
34
+
35
+ ```
36
+ Microphone → VAD → Whisper STT → Moltbot Gateway → ElevenLabs TTS → Speaker
37
+
38
+ MovementManager
39
+ HeadWobbler (speech-driven head movement)
40
+ ```
41
+
42
+ ## Prerequisites
43
+
44
+ Before running this project, you need:
45
+
46
+ ### 1. Moltbot Gateway (Required)
47
+
48
+ Moltbot must be installed and the gateway must be running. Follow the [Moltbot Getting Started guide](https://docs.molt.bot/start/getting-started) to:
49
+
50
+ 1. Install the CLI: `curl -fsSL https://molt.bot/install.sh | bash`
51
+ 2. Run the onboarding wizard: `moltbot onboard --install-daemon`
52
+ 3. Start the gateway: `moltbot gateway --port 18789`
53
+
54
+ Verify it's running:
55
+ ```bash
56
+ moltbot gateway status
57
+ ```
58
+
59
+ ### 2. Reachy Mini Robot (Required)
60
+
61
+ You need a [Reachy Mini](https://huggingface.co/spaces/pollen-robotics/Reachy_Mini) robot from Pollen Robotics with its daemon running.
62
+
63
+ Verify the daemon is running:
64
+ ```bash
65
+ curl -s http://localhost:8000/api/daemon/status | jq .state
66
+ ```
67
+
68
+ ### 3. ElevenLabs Account (Required)
69
+
70
+ Sign up at [ElevenLabs](https://elevenlabs.io/) and get an API key for text-to-speech.
71
+
72
+ ### 4. Python 3.12+ and uv
73
+
74
+ This project requires Python 3.12 or later and uses [uv](https://docs.astral.sh/uv/) for package management.
75
+
76
+ ## Setup
77
+
78
+ ```bash
79
+ git clone <this-repo>
80
+ cd reachy
81
+ uv sync
82
+ ```
83
+
84
+ ### Environment Variables
85
+
86
+ Create a `.env` file:
87
+
88
+ ```bash
89
+ CLAWDBOT_TOKEN=your_gateway_token
90
+ ELEVENLABS_API_KEY=your_elevenlabs_key
91
+ ```
92
+
93
+ Get your gateway token from the Moltbot configuration, or these will be pulled from the Moltbot config automatically if not set.
94
+
95
+ ## Running
96
+
97
+ ```bash
98
+ # Make sure Reachy Mini daemon is running
99
+ curl -s http://localhost:8000/api/daemon/status | jq .state
100
+
101
+ # Make sure Moltbot gateway is running
102
+ moltbot gateway status
103
+
104
+ # Start Moltbot's body
105
+ uv run moltbot-body
106
+ ```
107
+
108
+ ## CLI Options
109
+
110
+ | Flag | Description |
111
+ |------|-------------|
112
+ | `--debug` | Enable debug logging (verbose output) |
113
+ | `--profile` | Enable timing profiler - prints detailed timing breakdown after each conversation turn |
114
+ | `--profile-once` | Profile one conversation turn then exit (useful for benchmarking) |
115
+ | `--robot-name NAME` | Specify robot name for connection (if you have multiple robots) |
116
+ | `--gateway-url URL` | Moltbot gateway URL (default: `http://localhost:18789`) |
117
+
118
+ ### Examples
119
+
120
+ ```bash
121
+ # Run with debug logging
122
+ uv run moltbot-body --debug
123
+
124
+ # Profile a single conversation turn
125
+ uv run moltbot-body --profile-once
126
+
127
+ # Connect to a specific robot and gateway
128
+ uv run moltbot-body --robot-name my-reachy --gateway-url http://192.168.1.100:18789
129
+ ```
130
+
131
+ ### Profiling Output
132
+
133
+ When using `--profile` or `--profile-once`, you'll see a detailed timing breakdown after each turn:
134
+
135
+ ```
136
+ ============================================================
137
+ CONVERSATION TIMING PROFILE
138
+ ============================================================
139
+
140
+ 📝 User: "Hello, how are you?"
141
+ 🤖 Assistant: "I'm doing well, thank you for asking!"
142
+
143
+ ------------------------------------------------------------
144
+ TIMING BREAKDOWN
145
+ ------------------------------------------------------------
146
+
147
+ 🎤 Speech Detection:
148
+ Duration spoken: 1.23s
149
+
150
+ 📜 Whisper Transcription:
151
+ Time: 0.45s
152
+
153
+ 🧠 LLM (Moltbot):
154
+ Time to first token: 0.32s
155
+ Streaming time: 1.15s
156
+ Total time: 1.47s
157
+ Tokens: 42 (36.5 tok/s)
158
+
159
+ 🔊 TTS (ElevenLabs):
160
+ Time to first audio: 0.28s
161
+ Total streaming: 1.82s
162
+ Audio chunks: 15
163
+
164
+ ------------------------------------------------------------
165
+ END-TO-END LATENCY
166
+ ------------------------------------------------------------
167
+
168
+ ⏱️ Speech end → First audio: 1.05s
169
+ ⏱️ Total turn time: 4.50s
170
+
171
+ ============================================================
172
+ ```
173
+
174
+ ## Features
175
+
176
+ - **Voice Activation**: Listens for speech, processes when silence detected
177
+ - **Whisper STT**: Local speech-to-text transcription using faster-whisper
178
+ - **Moltbot Brain**: Claude-powered responses via the Moltbot gateway API
179
+ - **ElevenLabs TTS**: Natural voice output with streaming
180
+ - **Head Wobble**: Audio-driven head movement while speaking for natural expressiveness
181
+ - **Movement Manager**: 100Hz control loop for smooth robot motion
182
+ - **Breathing Animation**: Gentle idle breathing when not actively engaged
183
+
184
+ ## Tips for a Better Experience
185
+
186
+ ### Use a Low-Latency Inference Provider
187
+
188
+ For natural, conversational interactions, response latency is critical. The time from when you stop speaking to when the robot starts responding should ideally be under 1 second.
189
+
190
+ Consider using a fast inference provider like [Groq](https://groq.com/) which offers extremely low latency for supported models. You can configure this in your Moltbot settings. Use the `--profile` flag to measure your end-to-end latency and identify bottlenecks.
191
+
192
+ ### Let Moltbot Help You Set Up
193
+
194
+ Since Moltbot is an AI coding assistant, you can chat with it to help configure and customize the robot body! Try asking Moltbot (via any of its chat surfaces) to:
195
+
196
+ - Help you tune the head movement parameters
197
+ - Adjust the voice activation sensitivity
198
+ - Add new expressions or gestures
199
+ - Debug connection issues
200
+
201
+ Moltbot can read and modify this codebase, so it's a great collaborator for extending the robot's capabilities.
202
+
203
+ ## Roadmap
204
+
205
+ - [ ] Face tracking (look at the person speaking)
206
+ - [ ] DoA-based head tracking (direction of arrival for speaker localization)
207
+ - [ ] Wake word detection
208
+ - [ ] Expression gestures
209
+
210
+ ## License
211
+
212
+ MIT License - see [LICENSE](LICENSE) for details.
index.html ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html>
3
+
4
+ <head>
5
+ <meta charset="utf-8" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
7
+ <title>Moltbot Body - Reachy Mini App</title>
8
+ <link rel="preconnect" href="https://fonts.googleapis.com">
9
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
+ <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=Manrope:wght@400;500;600&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="style.css" />
12
+ </head>
13
+
14
+ <body>
15
+ <header class="hero">
16
+ <div class="topline">
17
+ <div class="brand">
18
+ <span class="logo">🤖</span>
19
+ <span class="brand-name">Moltbot Body</span>
20
+ </div>
21
+ <div class="pill">Voice conversation · Moltbot AI · Expressive motion</div>
22
+ </div>
23
+ <div class="hero-grid">
24
+ <div class="hero-copy">
25
+ <p class="eyebrow">Reachy Mini App</p>
26
+ <h1>Give Moltbot a physical presence.</h1>
27
+ <p class="lede">
28
+ Connect your Moltbot AI assistant to a Reachy Mini robot. Listen through the microphone, transcribe with Whisper, respond through Moltbot, and speak with natural TTS—all while moving expressively.
29
+ </p>
30
+ <div class="hero-actions">
31
+ <a class="btn primary" href="#highlights">Explore features</a>
32
+ <a class="btn ghost" href="#architecture">See how it works</a>
33
+ </div>
34
+ <div class="hero-badges">
35
+ <span>Local Whisper STT</span>
36
+ <span>Moltbot Gateway</span>
37
+ <span>ElevenLabs TTS</span>
38
+ <span>Expressive head movement</span>
39
+ </div>
40
+ </div>
41
+ <div class="hero-visual">
42
+ <div class="glass-card">
43
+ <div class="architecture-preview">
44
+ <pre>
45
+ Microphone → VAD → Whisper STT
46
+
47
+ Moltbot Gateway
48
+
49
+ ElevenLabs TTS → Speaker
50
+
51
+ MovementManager
52
+ HeadWobbler
53
+ </pre>
54
+ </div>
55
+ <p class="caption">End-to-end voice conversation pipeline with expressive robot motion.</p>
56
+ </div>
57
+ </div>
58
+ </div>
59
+ </header>
60
+
61
+ <section id="highlights" class="section features">
62
+ <div class="section-header">
63
+ <p class="eyebrow">What's inside</p>
64
+ <h2>A complete voice interface for your robot</h2>
65
+ <p class="intro">
66
+ Moltbot Body combines speech recognition, AI conversation, and expressive motion into a seamless experience.
67
+ </p>
68
+ </div>
69
+ <div class="feature-grid">
70
+ <div class="feature-card">
71
+ <span class="icon">🎤</span>
72
+ <h3>Voice activation</h3>
73
+ <p>Listens continuously and detects when you're speaking using voice activity detection.</p>
74
+ </div>
75
+ <div class="feature-card">
76
+ <span class="icon">📝</span>
77
+ <h3>Local transcription</h3>
78
+ <p>Fast, private speech-to-text using Whisper running locally on your machine.</p>
79
+ </div>
80
+ <div class="feature-card">
81
+ <span class="icon">🧠</span>
82
+ <h3>Moltbot brain</h3>
83
+ <p>Claude-powered responses through the Moltbot gateway with full tool access and memory.</p>
84
+ </div>
85
+ <div class="feature-card">
86
+ <span class="icon">🔊</span>
87
+ <h3>Natural voice</h3>
88
+ <p>High-quality streaming text-to-speech through ElevenLabs for natural conversation.</p>
89
+ </div>
90
+ <div class="feature-card">
91
+ <span class="icon">💃</span>
92
+ <h3>Expressive motion</h3>
93
+ <p>Audio-driven head wobble and breathing animations bring the robot to life while speaking.</p>
94
+ </div>
95
+ <div class="feature-card">
96
+ <span class="icon">⚡</span>
97
+ <h3>Low latency</h3>
98
+ <p>Optimized pipeline with profiling tools to measure and minimize response time.</p>
99
+ </div>
100
+ </div>
101
+ </section>
102
+
103
+ <section id="architecture" class="section story">
104
+ <div class="story-grid">
105
+ <div class="story-card">
106
+ <p class="eyebrow">How it works</p>
107
+ <h3>From speech to response in under a second</h3>
108
+ <ul class="story-list">
109
+ <li><span>🎤</span> Robot microphone captures your voice continuously.</li>
110
+ <li><span>🔇</span> Voice Activity Detection identifies when you stop speaking.</li>
111
+ <li><span>📝</span> Whisper transcribes your speech locally and privately.</li>
112
+ <li><span>🧠</span> Moltbot gateway processes your message with full AI capabilities.</li>
113
+ <li><span>🔊</span> ElevenLabs streams natural voice output in real-time.</li>
114
+ <li><span>🤖</span> Head wobbles expressively while the robot speaks.</li>
115
+ </ul>
116
+ </div>
117
+ <div class="story-card secondary">
118
+ <p class="eyebrow">Prerequisites</p>
119
+ <h3>What you need to get started</h3>
120
+ <p class="story-text">
121
+ This app requires a running Moltbot gateway, an ElevenLabs API key for TTS, and a Reachy Mini robot connected to your network.
122
+ </p>
123
+ <div class="chips">
124
+ <span class="chip">Moltbot Gateway</span>
125
+ <span class="chip">ElevenLabs API</span>
126
+ <span class="chip">Reachy Mini</span>
127
+ <span class="chip">Python 3.12+</span>
128
+ </div>
129
+ </div>
130
+ </div>
131
+ </section>
132
+
133
+ <footer class="footer">
134
+ <p>
135
+ Moltbot Body — giving Moltbot a physical presence with Reachy Mini.
136
+ Learn more about <a href="https://docs.molt.bot/" target="_blank" rel="noopener">Moltbot</a> and
137
+ <a href="https://huggingface.co/spaces/pollen-robotics/Reachy_Mini" target="_blank" rel="noopener">Reachy Mini</a>.
138
+ </p>
139
+ </footer>
140
+
141
+ </body>
142
+
143
+ </html>
moltbot_body/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """Moltbot's physical body - Reachy Mini integration with Clawdbot."""
2
+
3
+ __version__ = "0.1.0"
moltbot_body/audio/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Audio processing modules for head movement."""
moltbot_body/audio/head_wobbler.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Moves head given audio samples."""
2
+
3
+ import time
4
+ import queue
5
+ import base64
6
+ import logging
7
+ import threading
8
+ from typing import Tuple
9
+ from collections.abc import Callable
10
+
11
+ import numpy as np
12
+ from numpy.typing import NDArray
13
+
14
+ from moltbot_body.audio.speech_tapper import HOP_MS, SwayRollRT
15
+
16
+
17
+ SAMPLE_RATE = 24000
18
+ MOVEMENT_LATENCY_S = 0.2 # seconds between audio and robot movement
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ class HeadWobbler:
23
+ """Converts audio deltas (base64) into head movement offsets."""
24
+
25
+ def __init__(self, set_speech_offsets: Callable[[Tuple[float, float, float, float, float, float]], None]) -> None:
26
+ """Initialize the head wobbler."""
27
+ self._apply_offsets = set_speech_offsets
28
+ self._base_ts: float | None = None
29
+ self._hops_done: int = 0
30
+
31
+ self.audio_queue: "queue.Queue[Tuple[int, int, NDArray[np.int16]]]" = queue.Queue()
32
+ self.sway = SwayRollRT()
33
+
34
+ # Synchronization primitives
35
+ self._state_lock = threading.Lock()
36
+ self._sway_lock = threading.Lock()
37
+ self._generation = 0
38
+
39
+ self._stop_event = threading.Event()
40
+ self._thread: threading.Thread | None = None
41
+
42
+ def feed(self, delta_b64: str) -> None:
43
+ """Thread-safe: push audio into the consumer queue."""
44
+ buf = np.frombuffer(base64.b64decode(delta_b64), dtype=np.int16).reshape(1, -1)
45
+ with self._state_lock:
46
+ generation = self._generation
47
+ self.audio_queue.put((generation, SAMPLE_RATE, buf))
48
+
49
+ def start(self) -> None:
50
+ """Start the head wobbler loop in a thread."""
51
+ self._stop_event.clear()
52
+ self._thread = threading.Thread(target=self.working_loop, daemon=True)
53
+ self._thread.start()
54
+ logger.debug("Head wobbler started")
55
+
56
+ def stop(self) -> None:
57
+ """Stop the head wobbler loop."""
58
+ self._stop_event.set()
59
+ if self._thread is not None:
60
+ self._thread.join()
61
+ logger.debug("Head wobbler stopped")
62
+
63
+ def working_loop(self) -> None:
64
+ """Convert audio deltas into head movement offsets."""
65
+ hop_dt = HOP_MS / 1000.0
66
+
67
+ logger.debug("Head wobbler thread started")
68
+ while not self._stop_event.is_set():
69
+ queue_ref = self.audio_queue
70
+ try:
71
+ chunk_generation, sr, chunk = queue_ref.get_nowait() # (gen, sr, data)
72
+ except queue.Empty:
73
+ # avoid while to never exit
74
+ time.sleep(MOVEMENT_LATENCY_S)
75
+ continue
76
+
77
+ try:
78
+ with self._state_lock:
79
+ current_generation = self._generation
80
+ if chunk_generation != current_generation:
81
+ continue
82
+
83
+ if self._base_ts is None:
84
+ with self._state_lock:
85
+ if self._base_ts is None:
86
+ self._base_ts = time.monotonic()
87
+
88
+ pcm = np.asarray(chunk).squeeze(0)
89
+ with self._sway_lock:
90
+ results = self.sway.feed(pcm, sr)
91
+
92
+ i = 0
93
+ while i < len(results):
94
+ with self._state_lock:
95
+ if self._generation != current_generation:
96
+ break
97
+ base_ts = self._base_ts
98
+ hops_done = self._hops_done
99
+
100
+ if base_ts is None:
101
+ base_ts = time.monotonic()
102
+ with self._state_lock:
103
+ if self._base_ts is None:
104
+ self._base_ts = base_ts
105
+ hops_done = self._hops_done
106
+
107
+ target = base_ts + MOVEMENT_LATENCY_S + hops_done * hop_dt
108
+ now = time.monotonic()
109
+
110
+ if now - target >= hop_dt:
111
+ lag_hops = int((now - target) / hop_dt)
112
+ drop = min(lag_hops, len(results) - i - 1)
113
+ if drop > 0:
114
+ with self._state_lock:
115
+ self._hops_done += drop
116
+ hops_done = self._hops_done
117
+ i += drop
118
+ continue
119
+
120
+ if target > now:
121
+ time.sleep(target - now)
122
+ with self._state_lock:
123
+ if self._generation != current_generation:
124
+ break
125
+
126
+ r = results[i]
127
+ offsets = (
128
+ r["x_mm"] / 1000.0,
129
+ r["y_mm"] / 1000.0,
130
+ r["z_mm"] / 1000.0,
131
+ r["roll_rad"],
132
+ r["pitch_rad"],
133
+ r["yaw_rad"],
134
+ )
135
+
136
+ with self._state_lock:
137
+ if self._generation != current_generation:
138
+ break
139
+
140
+ self._apply_offsets(offsets)
141
+
142
+ with self._state_lock:
143
+ self._hops_done += 1
144
+ i += 1
145
+ finally:
146
+ queue_ref.task_done()
147
+ logger.debug("Head wobbler thread exited")
148
+
149
+ def reset(self) -> None:
150
+ """Reset the internal state."""
151
+ with self._state_lock:
152
+ self._generation += 1
153
+ self._base_ts = None
154
+ self._hops_done = 0
155
+
156
+ # Drain any queued audio chunks from previous generations
157
+ drained_any = False
158
+ while True:
159
+ try:
160
+ _, _, _ = self.audio_queue.get_nowait()
161
+ except queue.Empty:
162
+ break
163
+ else:
164
+ drained_any = True
165
+ self.audio_queue.task_done()
166
+
167
+ with self._sway_lock:
168
+ self.sway.reset()
169
+
170
+ if drained_any:
171
+ logger.debug("Head wobbler queue drained during reset")
moltbot_body/audio/speech_tapper.py ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+ import math
3
+ from typing import Any, Dict, List
4
+ from itertools import islice
5
+ from collections import deque
6
+
7
+ import numpy as np
8
+ from numpy.typing import NDArray
9
+
10
+
11
+ # Tunables
12
+ SR = 16_000
13
+ FRAME_MS = 20
14
+ HOP_MS = 50
15
+
16
+ SWAY_MASTER = 1.5
17
+ SENS_DB_OFFSET = +4.0
18
+ VAD_DB_ON = -35.0
19
+ VAD_DB_OFF = -45.0
20
+ VAD_ATTACK_MS = 40
21
+ VAD_RELEASE_MS = 250
22
+ ENV_FOLLOW_GAIN = 0.65
23
+
24
+ SWAY_F_PITCH = 2.2
25
+ SWAY_A_PITCH_DEG = 4.5
26
+ SWAY_F_YAW = 0.6
27
+ SWAY_A_YAW_DEG = 7.5
28
+ SWAY_F_ROLL = 1.3
29
+ SWAY_A_ROLL_DEG = 2.25
30
+ SWAY_F_X = 0.35
31
+ SWAY_A_X_MM = 4.5
32
+ SWAY_F_Y = 0.45
33
+ SWAY_A_Y_MM = 3.75
34
+ SWAY_F_Z = 0.25
35
+ SWAY_A_Z_MM = 2.25
36
+
37
+ SWAY_DB_LOW = -46.0
38
+ SWAY_DB_HIGH = -18.0
39
+ LOUDNESS_GAMMA = 0.9
40
+ SWAY_ATTACK_MS = 50
41
+ SWAY_RELEASE_MS = 250
42
+
43
+ # Derived
44
+ FRAME = int(SR * FRAME_MS / 1000)
45
+ HOP = int(SR * HOP_MS / 1000)
46
+ ATTACK_FR = max(1, int(VAD_ATTACK_MS / HOP_MS))
47
+ RELEASE_FR = max(1, int(VAD_RELEASE_MS / HOP_MS))
48
+ SWAY_ATTACK_FR = max(1, int(SWAY_ATTACK_MS / HOP_MS))
49
+ SWAY_RELEASE_FR = max(1, int(SWAY_RELEASE_MS / HOP_MS))
50
+
51
+
52
+ def _rms_dbfs(x: NDArray[np.float32]) -> float:
53
+ """Root-mean-square in dBFS for float32 mono array in [-1,1]."""
54
+ # numerically stable rms (avoid overflow)
55
+ x = x.astype(np.float32, copy=False)
56
+ rms = np.sqrt(np.mean(x * x, dtype=np.float32) + 1e-12, dtype=np.float32)
57
+ return float(20.0 * math.log10(float(rms) + 1e-12))
58
+
59
+
60
+ def _loudness_gain(db: float, offset: float = SENS_DB_OFFSET) -> float:
61
+ """Normalize dB into [0,1] with gamma; clipped to [0,1]."""
62
+ t = (db + offset - SWAY_DB_LOW) / (SWAY_DB_HIGH - SWAY_DB_LOW)
63
+ if t < 0.0:
64
+ t = 0.0
65
+ elif t > 1.0:
66
+ t = 1.0
67
+ return t**LOUDNESS_GAMMA if LOUDNESS_GAMMA != 1.0 else t
68
+
69
+
70
+ def _to_float32_mono(x: NDArray[Any]) -> NDArray[np.float32]:
71
+ """Convert arbitrary PCM array to float32 mono in [-1,1].
72
+
73
+ Accepts shapes: (N,), (1,N), (N,1), (C,N), (N,C).
74
+ """
75
+ a = np.asarray(x)
76
+ if a.ndim == 0:
77
+ return np.zeros(0, dtype=np.float32)
78
+
79
+ # If 2D, decide which axis is channels (prefer small first dim)
80
+ if a.ndim == 2:
81
+ # e.g., (channels, samples) if channels is small (<=8)
82
+ if a.shape[0] <= 8 and a.shape[0] <= a.shape[1]:
83
+ a = np.mean(a, axis=0)
84
+ else:
85
+ a = np.mean(a, axis=1)
86
+ elif a.ndim > 2:
87
+ a = np.mean(a.reshape(a.shape[0], -1), axis=0)
88
+
89
+ # Now 1D, cast/scale
90
+ if np.issubdtype(a.dtype, np.floating):
91
+ return a.astype(np.float32, copy=False)
92
+ # integer PCM
93
+ info = np.iinfo(a.dtype)
94
+ scale = float(max(-info.min, info.max))
95
+ return a.astype(np.float32) / (scale if scale != 0.0 else 1.0)
96
+
97
+
98
+ def _resample_linear(x: NDArray[np.float32], sr_in: int, sr_out: int) -> NDArray[np.float32]:
99
+ """Lightweight linear resampler for short buffers."""
100
+ if sr_in == sr_out or x.size == 0:
101
+ return x
102
+ # guard tiny sizes
103
+ n_out = int(round(x.size * sr_out / sr_in))
104
+ if n_out <= 1:
105
+ return np.zeros(0, dtype=np.float32)
106
+ t_in = np.linspace(0.0, 1.0, num=x.size, dtype=np.float32, endpoint=True)
107
+ t_out = np.linspace(0.0, 1.0, num=n_out, dtype=np.float32, endpoint=True)
108
+ return np.interp(t_out, t_in, x).astype(np.float32, copy=False)
109
+
110
+
111
+ class SwayRollRT:
112
+ """Feed audio chunks → per-hop sway outputs.
113
+
114
+ Usage:
115
+ rt = SwayRollRT()
116
+ rt.feed(pcm_int16_or_float, sr) -> List[dict]
117
+ """
118
+
119
+ def __init__(self, rng_seed: int = 7):
120
+ """Initialize state."""
121
+ self._seed = int(rng_seed)
122
+ self.samples: deque[float] = deque(maxlen=10 * SR) # sliding window for VAD/env
123
+ self.carry: NDArray[np.float32] = np.zeros(0, dtype=np.float32)
124
+
125
+ self.vad_on = False
126
+ self.vad_above = 0
127
+ self.vad_below = 0
128
+
129
+ self.sway_env = 0.0
130
+ self.sway_up = 0
131
+ self.sway_down = 0
132
+
133
+ rng = np.random.default_rng(self._seed)
134
+ self.phase_pitch = float(rng.random() * 2 * math.pi)
135
+ self.phase_yaw = float(rng.random() * 2 * math.pi)
136
+ self.phase_roll = float(rng.random() * 2 * math.pi)
137
+ self.phase_x = float(rng.random() * 2 * math.pi)
138
+ self.phase_y = float(rng.random() * 2 * math.pi)
139
+ self.phase_z = float(rng.random() * 2 * math.pi)
140
+ self.t = 0.0
141
+
142
+ def reset(self) -> None:
143
+ """Reset state (VAD/env/buffers/time) but keep initial phases/seed."""
144
+ self.samples.clear()
145
+ self.carry = np.zeros(0, dtype=np.float32)
146
+ self.vad_on = False
147
+ self.vad_above = 0
148
+ self.vad_below = 0
149
+ self.sway_env = 0.0
150
+ self.sway_up = 0
151
+ self.sway_down = 0
152
+ self.t = 0.0
153
+
154
+ def feed(self, pcm: NDArray[Any], sr: int | None) -> List[Dict[str, float]]:
155
+ """Stream in PCM chunk. Returns a list of sway dicts, one per hop (HOP_MS).
156
+
157
+ Args:
158
+ pcm: np.ndarray, shape (N,) or (C,N)/(N,C); int or float.
159
+ sr: sample rate of `pcm` (None -> assume SR).
160
+
161
+ """
162
+ sr_in = SR if sr is None else int(sr)
163
+ x = _to_float32_mono(pcm)
164
+ if x.size == 0:
165
+ return []
166
+ if sr_in != SR:
167
+ x = _resample_linear(x, sr_in, SR)
168
+ if x.size == 0:
169
+ return []
170
+
171
+ # append to carry and consume fixed HOP chunks
172
+ if self.carry.size:
173
+ self.carry = np.concatenate([self.carry, x])
174
+ else:
175
+ self.carry = x
176
+
177
+ out: List[Dict[str, float]] = []
178
+
179
+ while self.carry.size >= HOP:
180
+ hop = self.carry[:HOP]
181
+ remaining: NDArray[np.float32] = self.carry[HOP:]
182
+ self.carry = remaining
183
+
184
+ # keep sliding window for VAD/env computation
185
+ # (deque accepts any iterable; list() for small HOP is fine)
186
+ self.samples.extend(hop.tolist())
187
+ if len(self.samples) < FRAME:
188
+ self.t += HOP_MS / 1000.0
189
+ continue
190
+
191
+ frame = np.fromiter(
192
+ islice(self.samples, len(self.samples) - FRAME, len(self.samples)),
193
+ dtype=np.float32,
194
+ count=FRAME,
195
+ )
196
+ db = _rms_dbfs(frame)
197
+
198
+ # VAD with hysteresis + attack/release
199
+ if db >= VAD_DB_ON:
200
+ self.vad_above += 1
201
+ self.vad_below = 0
202
+ if not self.vad_on and self.vad_above >= ATTACK_FR:
203
+ self.vad_on = True
204
+ elif db <= VAD_DB_OFF:
205
+ self.vad_below += 1
206
+ self.vad_above = 0
207
+ if self.vad_on and self.vad_below >= RELEASE_FR:
208
+ self.vad_on = False
209
+
210
+ if self.vad_on:
211
+ self.sway_up = min(SWAY_ATTACK_FR, self.sway_up + 1)
212
+ self.sway_down = 0
213
+ else:
214
+ self.sway_down = min(SWAY_RELEASE_FR, self.sway_down + 1)
215
+ self.sway_up = 0
216
+
217
+ up = self.sway_up / SWAY_ATTACK_FR
218
+ down = 1.0 - (self.sway_down / SWAY_RELEASE_FR)
219
+ target = up if self.vad_on else down
220
+ self.sway_env += ENV_FOLLOW_GAIN * (target - self.sway_env)
221
+ # clamp
222
+ if self.sway_env < 0.0:
223
+ self.sway_env = 0.0
224
+ elif self.sway_env > 1.0:
225
+ self.sway_env = 1.0
226
+
227
+ loud = _loudness_gain(db) * SWAY_MASTER
228
+ env = self.sway_env
229
+ self.t += HOP_MS / 1000.0
230
+
231
+ # oscillators
232
+ pitch = (
233
+ math.radians(SWAY_A_PITCH_DEG)
234
+ * loud
235
+ * env
236
+ * math.sin(2 * math.pi * SWAY_F_PITCH * self.t + self.phase_pitch)
237
+ )
238
+ yaw = (
239
+ math.radians(SWAY_A_YAW_DEG)
240
+ * loud
241
+ * env
242
+ * math.sin(2 * math.pi * SWAY_F_YAW * self.t + self.phase_yaw)
243
+ )
244
+ roll = (
245
+ math.radians(SWAY_A_ROLL_DEG)
246
+ * loud
247
+ * env
248
+ * math.sin(2 * math.pi * SWAY_F_ROLL * self.t + self.phase_roll)
249
+ )
250
+ x_mm = SWAY_A_X_MM * loud * env * math.sin(2 * math.pi * SWAY_F_X * self.t + self.phase_x)
251
+ y_mm = SWAY_A_Y_MM * loud * env * math.sin(2 * math.pi * SWAY_F_Y * self.t + self.phase_y)
252
+ z_mm = SWAY_A_Z_MM * loud * env * math.sin(2 * math.pi * SWAY_F_Z * self.t + self.phase_z)
253
+
254
+ out.append(
255
+ {
256
+ "pitch_rad": pitch,
257
+ "yaw_rad": yaw,
258
+ "roll_rad": roll,
259
+ "pitch_deg": math.degrees(pitch),
260
+ "yaw_deg": math.degrees(yaw),
261
+ "roll_deg": math.degrees(roll),
262
+ "x_mm": x_mm,
263
+ "y_mm": y_mm,
264
+ "z_mm": z_mm,
265
+ },
266
+ )
267
+
268
+ return out
moltbot_body/clawdbot_handler.py ADDED
@@ -0,0 +1,672 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler that bridges audio I/O with Clawdbot via Whisper STT and ElevenLabs TTS."""
2
+
3
+ import io
4
+ import os
5
+ import json
6
+ import time
7
+ import base64
8
+ import queue
9
+ import asyncio
10
+ import logging
11
+ import tempfile
12
+ import threading
13
+ from dataclasses import dataclass, field
14
+ from typing import TYPE_CHECKING, Tuple, Optional, Callable, AsyncIterator
15
+ from pathlib import Path
16
+
17
+ import httpx
18
+ import numpy as np
19
+ import soundfile as sf
20
+ import websockets
21
+ from httpx_sse import aconnect_sse
22
+ from numpy.typing import NDArray
23
+ from dotenv import load_dotenv, find_dotenv
24
+
25
+ if TYPE_CHECKING:
26
+ from moltbot_body.audio.head_wobbler import HeadWobbler
27
+
28
+ load_dotenv(find_dotenv())
29
+
30
+ logger = logging.getLogger(__name__)
31
+
32
+
33
+ @dataclass
34
+ class ConversationTiming:
35
+ """Tracks timing for a single conversation turn."""
36
+
37
+ # Speech detection
38
+ speech_start: float = 0.0
39
+ speech_end: float = 0.0
40
+
41
+ # Transcription
42
+ transcription_start: float = 0.0
43
+ transcription_end: float = 0.0
44
+
45
+ # LLM
46
+ llm_request_start: float = 0.0
47
+ llm_first_token: float = 0.0
48
+ llm_last_token: float = 0.0
49
+ llm_token_count: int = 0
50
+
51
+ # TTS
52
+ tts_websocket_open: float = 0.0
53
+ tts_first_audio: float = 0.0
54
+ tts_last_audio: float = 0.0
55
+ tts_audio_chunks: int = 0
56
+
57
+ # Overall
58
+ turn_start: float = 0.0
59
+ turn_end: float = 0.0
60
+
61
+ # Content
62
+ user_text: str = ""
63
+ assistant_text: str = ""
64
+
65
+ def print_summary(self) -> None:
66
+ """Print a formatted timing summary."""
67
+ print("\n" + "=" * 60)
68
+ print("CONVERSATION TIMING PROFILE")
69
+ print("=" * 60)
70
+
71
+ print(f"\n📝 User: \"{self.user_text[:80]}{'...' if len(self.user_text) > 80 else ''}\"")
72
+ print(f"🤖 Assistant: \"{self.assistant_text[:80]}{'...' if len(self.assistant_text) > 80 else ''}\"")
73
+
74
+ print("\n" + "-" * 60)
75
+ print("TIMING BREAKDOWN")
76
+ print("-" * 60)
77
+
78
+ # Speech duration
79
+ speech_duration = self.speech_end - self.speech_start if self.speech_end else 0
80
+ print(f"\n🎤 Speech Detection:")
81
+ print(f" Duration spoken: {speech_duration:.2f}s")
82
+
83
+ # Transcription
84
+ transcription_time = self.transcription_end - self.transcription_start if self.transcription_end else 0
85
+ print(f"\n📜 Whisper Transcription:")
86
+ print(f" Time: {transcription_time:.2f}s")
87
+
88
+ # LLM
89
+ if self.llm_first_token:
90
+ llm_ttft = self.llm_first_token - self.llm_request_start
91
+ llm_total = self.llm_last_token - self.llm_request_start if self.llm_last_token else 0
92
+ llm_streaming = self.llm_last_token - self.llm_first_token if self.llm_last_token else 0
93
+ tokens_per_sec = self.llm_token_count / llm_streaming if llm_streaming > 0 else 0
94
+ print(f"\n🧠 LLM (Clawdbot):")
95
+ print(f" Time to first token: {llm_ttft:.2f}s")
96
+ print(f" Streaming time: {llm_streaming:.2f}s")
97
+ print(f" Total time: {llm_total:.2f}s")
98
+ print(f" Tokens: {self.llm_token_count} ({tokens_per_sec:.1f} tok/s)")
99
+
100
+ # TTS
101
+ if self.tts_first_audio:
102
+ tts_ttfa = self.tts_first_audio - self.tts_websocket_open
103
+ tts_total = self.tts_last_audio - self.tts_websocket_open if self.tts_last_audio else 0
104
+ print(f"\n🔊 TTS (ElevenLabs):")
105
+ print(f" Time to first audio: {tts_ttfa:.2f}s")
106
+ print(f" Total streaming: {tts_total:.2f}s")
107
+ print(f" Audio chunks: {self.tts_audio_chunks}")
108
+
109
+ # End-to-end
110
+ print("\n" + "-" * 60)
111
+ print("END-TO-END LATENCY")
112
+ print("-" * 60)
113
+
114
+ if self.tts_first_audio and self.speech_end:
115
+ e2e_to_first_audio = self.tts_first_audio - self.speech_end
116
+ print(f"\n⏱️ Speech end → First audio: {e2e_to_first_audio:.2f}s")
117
+
118
+ total_turn = self.turn_end - self.turn_start if self.turn_end else 0
119
+ print(f"⏱️ Total turn time: {total_turn:.2f}s")
120
+
121
+ print("\n" + "=" * 60 + "\n")
122
+
123
+ # Audio settings
124
+ SAMPLE_RATE = 16000 # Whisper expects 16kHz
125
+ SILENCE_THRESHOLD = 0.015 # RMS threshold for silence detection
126
+ SILENCE_DURATION = 0.8 # Seconds of silence to end utterance (reduced for responsiveness)
127
+ MIN_SPEECH_DURATION = 0.3 # Minimum speech duration to process
128
+
129
+
130
+ class ClawdbotHandler:
131
+ """Handles the Clawdbot conversation loop with Whisper STT and ElevenLabs TTS."""
132
+
133
+ def __init__(
134
+ self,
135
+ gateway_url: str = "http://localhost:18789",
136
+ gateway_token: Optional[str] = None,
137
+ elevenlabs_api_key: Optional[str] = None,
138
+ elevenlabs_voice_id: str = "qA5SHJ9UjGlW2QwXWR7w",
139
+ head_wobbler: Optional["HeadWobbler"] = None,
140
+ on_listening: Optional[Callable[[], None]] = None,
141
+ on_thinking: Optional[Callable[[], None]] = None,
142
+ on_speaking: Optional[Callable[[], None]] = None,
143
+ profile_mode: bool = False,
144
+ on_profile_complete: Optional[Callable[[ConversationTiming], None]] = None,
145
+ ):
146
+ """Initialize the handler.
147
+
148
+ Args:
149
+ gateway_url: Clawdbot gateway URL
150
+ gateway_token: Gateway auth token
151
+ elevenlabs_api_key: ElevenLabs API key
152
+ elevenlabs_voice_id: ElevenLabs voice ID
153
+ head_wobbler: HeadWobbler instance for audio-driven head movement
154
+ on_listening: Callback when listening starts
155
+ on_thinking: Callback when processing/thinking
156
+ on_speaking: Callback when speaking starts
157
+ profile_mode: If True, print timing summary after each turn
158
+ on_profile_complete: Callback with timing data after each turn completes
159
+ """
160
+ self.gateway_url = gateway_url
161
+ self.gateway_token = gateway_token or os.getenv("CLAWDBOT_TOKEN")
162
+ self.elevenlabs_api_key = elevenlabs_api_key or os.getenv("ELEVENLABS_API_KEY")
163
+ self.elevenlabs_voice_id = elevenlabs_voice_id
164
+ self.head_wobbler = head_wobbler
165
+
166
+ # State callbacks
167
+ self.on_listening = on_listening
168
+ self.on_thinking = on_thinking
169
+ self.on_speaking = on_speaking
170
+
171
+ # Profiling
172
+ self.profile_mode = profile_mode
173
+ self.on_profile_complete = on_profile_complete
174
+ self._current_timing: Optional[ConversationTiming] = None
175
+ self._timing_history: list[ConversationTiming] = []
176
+
177
+ # Audio buffers
178
+ self._audio_buffer: list[NDArray[np.float32]] = []
179
+ self._buffer_lock = threading.Lock()
180
+
181
+ # Speech detection state
182
+ self._is_speaking = False
183
+ self._silence_start: Optional[float] = None
184
+ self._speech_start: Optional[float] = None
185
+
186
+ # Output queue for TTS audio
187
+ self.output_queue: asyncio.Queue[Tuple[int, NDArray[np.float32]]] = asyncio.Queue()
188
+
189
+ # Whisper model (lazy load)
190
+ self._whisper_model = None
191
+
192
+ # Processing state
193
+ self._processing = False
194
+ self._stop_event = threading.Event()
195
+
196
+ def _load_whisper(self):
197
+ """Lazy load Whisper model."""
198
+ if self._whisper_model is None:
199
+ from faster_whisper import WhisperModel
200
+ logger.info("Loading Whisper model...")
201
+ self._whisper_model = WhisperModel("small.en")
202
+ logger.info("Whisper model loaded")
203
+ return self._whisper_model
204
+
205
+ def _compute_rms(self, audio: NDArray[np.float32]) -> float:
206
+ """Compute RMS of audio signal."""
207
+ return float(np.sqrt(np.mean(audio ** 2)))
208
+
209
+ async def receive(self, audio_frame: Tuple[int, NDArray]) -> None:
210
+ """Receive an audio frame from the microphone.
211
+
212
+ Args:
213
+ audio_frame: Tuple of (sample_rate, audio_data)
214
+ """
215
+ input_sr, audio_data = audio_frame
216
+
217
+ # Convert to float32 if needed
218
+ if audio_data.dtype == np.int16:
219
+ audio_data = audio_data.astype(np.float32) / 32768.0
220
+
221
+ # Resample to 16kHz if needed
222
+ if input_sr != SAMPLE_RATE:
223
+ from scipy.signal import resample
224
+ num_samples = int(len(audio_data) * SAMPLE_RATE / input_sr)
225
+ audio_data = resample(audio_data, num_samples).astype(np.float32)
226
+
227
+ # Check for speech
228
+ rms = self._compute_rms(audio_data)
229
+ is_speech = rms > SILENCE_THRESHOLD
230
+
231
+ now = time.time()
232
+
233
+ with self._buffer_lock:
234
+ if is_speech:
235
+ # Speech detected
236
+ if not self._is_speaking:
237
+ self._is_speaking = True
238
+ self._speech_start = now
239
+ self._silence_start = None
240
+ if self.on_listening:
241
+ self.on_listening()
242
+ logger.debug("Speech started")
243
+
244
+ self._audio_buffer.append(audio_data)
245
+ self._silence_start = None
246
+
247
+ else:
248
+ # Silence
249
+ if self._is_speaking:
250
+ # Still accumulating (might resume speaking)
251
+ self._audio_buffer.append(audio_data)
252
+
253
+ if self._silence_start is None:
254
+ self._silence_start = now
255
+ elif now - self._silence_start > SILENCE_DURATION:
256
+ # End of utterance
257
+ speech_duration = now - (self._speech_start or now)
258
+ if speech_duration >= MIN_SPEECH_DURATION:
259
+ # Process the utterance
260
+ audio_to_process = np.concatenate(self._audio_buffer)
261
+ speech_start_time = self._speech_start
262
+ speech_end_time = now
263
+ self._audio_buffer.clear()
264
+ self._is_speaking = False
265
+ self._silence_start = None
266
+
267
+ # Process in background with timing info
268
+ asyncio.create_task(self._process_utterance(
269
+ audio_to_process,
270
+ speech_start_time,
271
+ speech_end_time
272
+ ))
273
+ else:
274
+ # Too short, discard
275
+ self._audio_buffer.clear()
276
+ self._is_speaking = False
277
+ self._silence_start = None
278
+ logger.debug("Utterance too short, discarding")
279
+
280
+ async def _process_utterance(
281
+ self,
282
+ audio: NDArray[np.float32],
283
+ speech_start: Optional[float] = None,
284
+ speech_end: Optional[float] = None,
285
+ ) -> None:
286
+ """Process a complete utterance: transcribe, stream to Clawdbot, stream TTS."""
287
+ if self._processing:
288
+ logger.warning("Already processing, skipping utterance")
289
+ return
290
+
291
+ self._processing = True
292
+
293
+ # Initialize timing for this turn
294
+ timing = ConversationTiming()
295
+ timing.turn_start = time.time()
296
+ timing.speech_start = speech_start or timing.turn_start
297
+ timing.speech_end = speech_end or timing.turn_start
298
+ self._current_timing = timing
299
+
300
+ try:
301
+ if self.on_thinking:
302
+ self.on_thinking()
303
+
304
+ # 1. Transcribe with Whisper
305
+ logger.info("Transcribing...")
306
+ timing.transcription_start = time.time()
307
+ transcript = await self._transcribe(audio)
308
+ timing.transcription_end = time.time()
309
+
310
+ if not transcript or transcript.strip() == "":
311
+ logger.debug("Empty transcript, skipping")
312
+ return
313
+
314
+ timing.user_text = transcript
315
+ logger.info(f"User said: {transcript}")
316
+
317
+ # 2. Stream from Clawdbot directly to TTS via WebSocket
318
+ # This starts speaking as soon as LLM tokens arrive
319
+ logger.info("Streaming response...")
320
+ if self.on_speaking:
321
+ self.on_speaking()
322
+
323
+ await self._stream_llm_to_tts(transcript, timing)
324
+
325
+ timing.turn_end = time.time()
326
+
327
+ # Print/record timing summary
328
+ if self.profile_mode:
329
+ timing.print_summary()
330
+
331
+ self._timing_history.append(timing)
332
+
333
+ if self.on_profile_complete:
334
+ self.on_profile_complete(timing)
335
+
336
+ except Exception as e:
337
+ logger.error(f"Error processing utterance: {e}", exc_info=True)
338
+ finally:
339
+ self._processing = False
340
+ self._current_timing = None
341
+
342
+ async def _transcribe(self, audio: NDArray[np.float32]) -> str:
343
+ """Transcribe audio using Whisper."""
344
+ model = self._load_whisper()
345
+
346
+ # Write to temp file (Whisper expects a file path)
347
+ with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
348
+ sf.write(f.name, audio, SAMPLE_RATE)
349
+ temp_path = f.name
350
+
351
+ try:
352
+ # Run in executor to not block
353
+ loop = asyncio.get_event_loop()
354
+ segments, _ = await loop.run_in_executor(
355
+ None,
356
+ lambda: model.transcribe(temp_path, language="en")
357
+ )
358
+ # faster-whisper returns an iterator of segments
359
+ return "".join(segment.text for segment in segments).strip()
360
+ finally:
361
+ Path(temp_path).unlink(missing_ok=True)
362
+
363
+ async def _stream_clawdbot(self, message: str) -> AsyncIterator[str]:
364
+ """Stream response from Clawdbot via OpenAI-compatible SSE endpoint.
365
+
366
+ Uses httpx-sse for proper SSE parsing without buffering issues.
367
+ """
368
+ async with httpx.AsyncClient(timeout=httpx.Timeout(120.0)) as client:
369
+ headers = {
370
+ "Content-Type": "application/json",
371
+ "x-clawdbot-agent-id": "main",
372
+ }
373
+ if self.gateway_token:
374
+ headers["Authorization"] = f"Bearer {self.gateway_token}"
375
+
376
+ url = f"{self.gateway_url}/v1/chat/completions"
377
+ payload = {
378
+ "model": "clawdbot:main",
379
+ "messages": [{"role": "user", "content": message}],
380
+ "user": "moltbot-body",
381
+ "stream": True,
382
+ }
383
+
384
+ stream_start_time = time.time()
385
+ logger.info(f"[STREAM] Opening SSE connection to {url}")
386
+
387
+ try:
388
+ async with aconnect_sse(
389
+ client,
390
+ "POST",
391
+ url,
392
+ json=payload,
393
+ headers=headers
394
+ ) as event_source:
395
+ # Check response status
396
+ event_source.response.raise_for_status()
397
+
398
+ connection_time = time.time() - stream_start_time
399
+ content_type = event_source.response.headers.get("content-type", "")
400
+ logger.info(f"[STREAM] SSE connected in {connection_time:.2f}s, content-type: {content_type}")
401
+
402
+ first_event_time = None
403
+ event_count = 0
404
+
405
+ # Iterate over SSE events (no buffering!)
406
+ async for sse in event_source.aiter_sse():
407
+ event_count += 1
408
+ now = time.time()
409
+ elapsed = now - stream_start_time
410
+
411
+ if first_event_time is None:
412
+ first_event_time = now
413
+ logger.info(f"[STREAM] First SSE event at {elapsed:.2f}s")
414
+
415
+ # Check for stream end
416
+ if sse.data == "[DONE]":
417
+ break
418
+
419
+ # Parse the JSON data
420
+ try:
421
+ data = json.loads(sse.data)
422
+ choices = data.get("choices", [])
423
+ if choices:
424
+ delta = choices[0].get("delta", {})
425
+ content = delta.get("content", "")
426
+ if content:
427
+ logger.debug(f"[STREAM] Event {event_count} at {elapsed:.2f}s: {content[:50]}")
428
+ yield content
429
+ except json.JSONDecodeError:
430
+ logger.debug(f"[STREAM] Non-JSON SSE data: {sse.data[:50]}")
431
+ continue
432
+
433
+ # Log stream completion stats
434
+ stream_end_time = time.time()
435
+ total_stream_time = stream_end_time - stream_start_time
436
+ if first_event_time:
437
+ time_to_first = first_event_time - stream_start_time
438
+ streaming_duration = stream_end_time - first_event_time
439
+ logger.info(f"[STREAM] Complete: {event_count} events in {total_stream_time:.2f}s "
440
+ f"(TTFE: {time_to_first:.2f}s, streaming: {streaming_duration:.2f}s)")
441
+ else:
442
+ logger.warning(f"[STREAM] Complete: No events received in {total_stream_time:.2f}s")
443
+
444
+ except httpx.HTTPStatusError as e:
445
+ logger.error(f"Clawdbot HTTP error: {e.response.status_code} - {e.response.text[:200]}")
446
+ except Exception as e:
447
+ logger.error(f"Clawdbot streaming error: {e}")
448
+
449
+ async def _stream_llm_to_tts(
450
+ self,
451
+ message: str,
452
+ timing: Optional[ConversationTiming] = None
453
+ ) -> None:
454
+ """Stream LLM response directly to ElevenLabs WebSocket TTS for minimal latency.
455
+
456
+ Waits for first LLM token before opening WebSocket to avoid 20-second timeout,
457
+ then streams remaining tokens as they arrive.
458
+ """
459
+ if not self.elevenlabs_api_key:
460
+ logger.warning("No ElevenLabs API key, skipping TTS")
461
+ return
462
+
463
+ tts_sample_rate = 22050
464
+ ws_url = f"wss://api.elevenlabs.io/v1/text-to-speech/{self.elevenlabs_voice_id}/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_22050"
465
+
466
+ full_response = [] # Collect for logging and fallback
467
+ receive_task = None
468
+ ws = None
469
+
470
+ # Track timing for TTS audio reception
471
+ first_audio_received = False
472
+
473
+ try:
474
+ # Get async iterator for LLM tokens
475
+ if timing:
476
+ timing.llm_request_start = time.time()
477
+
478
+ llm_stream = self._stream_clawdbot(message)
479
+
480
+ # Wait for first token before opening WebSocket
481
+ logger.info("Waiting for first LLM token...")
482
+ first_token = None
483
+ async for token in llm_stream:
484
+ first_token = token
485
+ full_response.append(token)
486
+ if timing:
487
+ timing.llm_first_token = time.time()
488
+ logger.debug(f"First token received: {token[:50] if len(token) > 50 else token}")
489
+ break
490
+
491
+ if first_token is None:
492
+ logger.warning("No tokens received from LLM")
493
+ return
494
+
495
+ # Now open WebSocket - we have tokens to send
496
+ logger.info("Opening TTS WebSocket...")
497
+ ws = await websockets.connect(ws_url)
498
+
499
+ if timing:
500
+ timing.tts_websocket_open = time.time()
501
+
502
+ # Initialize the WebSocket connection
503
+ init_message = {
504
+ "text": " ", # Initial space to start the stream
505
+ "voice_settings": {
506
+ "stability": 0.5,
507
+ "similarity_boost": 0.75,
508
+ },
509
+ "xi_api_key": self.elevenlabs_api_key,
510
+ "auto_mode": True, # Let ElevenLabs handle chunk timing
511
+ }
512
+ await ws.send(json.dumps(init_message))
513
+ logger.debug("ElevenLabs WebSocket initialized")
514
+
515
+ # Task to receive audio from WebSocket and queue for playback
516
+ async def receive_audio():
517
+ nonlocal first_audio_received
518
+ try:
519
+ async for msg in ws:
520
+ try:
521
+ data = json.loads(msg)
522
+ audio_b64 = data.get("audio")
523
+ if audio_b64:
524
+ # Track first audio timing
525
+ if timing and not first_audio_received:
526
+ timing.tts_first_audio = time.time()
527
+ first_audio_received = True
528
+
529
+ if timing:
530
+ timing.tts_audio_chunks += 1
531
+ timing.tts_last_audio = time.time()
532
+
533
+ # Decode base64 PCM audio
534
+ audio_bytes = base64.b64decode(audio_b64)
535
+ audio_int16 = np.frombuffer(audio_bytes, dtype=np.int16)
536
+ audio_data = audio_int16.astype(np.float32) / 32768.0
537
+
538
+ # Feed to head wobbler
539
+ if self.head_wobbler is not None:
540
+ self.head_wobbler.feed(audio_b64)
541
+
542
+ # Queue for playback
543
+ await self.output_queue.put((tts_sample_rate, audio_data))
544
+
545
+ # Check if stream is done
546
+ if data.get("isFinal"):
547
+ logger.debug("ElevenLabs stream complete")
548
+ break
549
+ except json.JSONDecodeError:
550
+ continue
551
+ except websockets.exceptions.ConnectionClosed as e:
552
+ logger.debug(f"WebSocket closed during receive: {e}")
553
+
554
+ # Start receiving audio in background
555
+ receive_task = asyncio.create_task(receive_audio())
556
+
557
+ # Send first token
558
+ logger.debug(f"Sending token 1 to TTS: {first_token[:50] if len(first_token) > 50 else first_token}")
559
+ await ws.send(json.dumps({"text": first_token}))
560
+
561
+ # Continue streaming remaining tokens
562
+ token_count = 1
563
+ async for token in llm_stream:
564
+ full_response.append(token)
565
+ token_count += 1
566
+ if timing:
567
+ timing.llm_last_token = time.time()
568
+ logger.debug(f"Sending token {token_count} to TTS: {token[:50] if len(token) > 50 else token}")
569
+ await ws.send(json.dumps({"text": token}))
570
+
571
+ if timing:
572
+ timing.llm_token_count = token_count
573
+ if not timing.llm_last_token:
574
+ timing.llm_last_token = time.time()
575
+
576
+ logger.info(f"Sent {token_count} tokens to TTS")
577
+
578
+ # Signal end of text
579
+ await ws.send(json.dumps({"text": ""}))
580
+
581
+ # Wait for audio to finish with timeout
582
+ try:
583
+ await asyncio.wait_for(receive_task, timeout=60.0)
584
+ except asyncio.TimeoutError:
585
+ logger.warning("Timeout waiting for TTS audio, continuing")
586
+ receive_task.cancel()
587
+
588
+ response_text = "".join(full_response)
589
+ if timing:
590
+ timing.assistant_text = response_text
591
+ logger.info(f"Clawdbot response: {response_text[:100]}...")
592
+
593
+ except websockets.exceptions.ConnectionClosedError as e:
594
+ logger.warning(f"WebSocket closed: {e}")
595
+ # Fallback to HTTP streaming with accumulated response
596
+ if full_response:
597
+ if timing:
598
+ timing.assistant_text = "".join(full_response)
599
+ logger.info("Falling back to HTTP streaming TTS")
600
+ await self._generate_tts_fallback("".join(full_response))
601
+ except Exception as e:
602
+ logger.error(f"LLM-to-TTS pipeline error: {e}", exc_info=True)
603
+ # Fallback: if WebSocket fails, try to use accumulated response with HTTP streaming
604
+ if full_response:
605
+ if timing:
606
+ timing.assistant_text = "".join(full_response)
607
+ logger.info("Falling back to HTTP streaming TTS")
608
+ await self._generate_tts_fallback("".join(full_response))
609
+ finally:
610
+ if receive_task and not receive_task.done():
611
+ receive_task.cancel()
612
+ if ws:
613
+ await ws.close()
614
+
615
+ async def _generate_tts_fallback(self, text: str) -> None:
616
+ """Fallback TTS using HTTP streaming (used if WebSocket fails)."""
617
+ if not self.elevenlabs_api_key or not text:
618
+ return
619
+
620
+ tts_sample_rate = 22050
621
+
622
+ async with httpx.AsyncClient() as client:
623
+ try:
624
+ async with client.stream(
625
+ "POST",
626
+ f"https://api.elevenlabs.io/v1/text-to-speech/{self.elevenlabs_voice_id}/stream",
627
+ params={
628
+ "output_format": "pcm_22050",
629
+ "optimize_streaming_latency": "3",
630
+ },
631
+ json={
632
+ "text": text,
633
+ "model_id": "eleven_flash_v2_5",
634
+ "voice_settings": {
635
+ "stability": 0.5,
636
+ "similarity_boost": 0.75,
637
+ }
638
+ },
639
+ headers={
640
+ "xi-api-key": self.elevenlabs_api_key,
641
+ "Content-Type": "application/json",
642
+ },
643
+ timeout=60.0,
644
+ ) as response:
645
+ response.raise_for_status()
646
+
647
+ async for chunk in response.aiter_bytes(chunk_size=4096):
648
+ if not chunk:
649
+ continue
650
+
651
+ audio_int16 = np.frombuffer(chunk, dtype=np.int16)
652
+ audio_data = audio_int16.astype(np.float32) / 32768.0
653
+
654
+ if self.head_wobbler is not None:
655
+ audio_b64 = base64.b64encode(audio_int16.tobytes()).decode()
656
+ self.head_wobbler.feed(audio_b64)
657
+
658
+ await self.output_queue.put((tts_sample_rate, audio_data))
659
+
660
+ except Exception as e:
661
+ logger.error(f"TTS fallback error: {e}")
662
+
663
+ async def emit(self) -> Optional[Tuple[int, NDArray[np.float32]]]:
664
+ """Get the next audio chunk for playback."""
665
+ try:
666
+ return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
667
+ except asyncio.TimeoutError:
668
+ return None
669
+
670
+ def stop(self) -> None:
671
+ """Stop the handler."""
672
+ self._stop_event.set()
moltbot_body/main.py ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Main entry point for Moltbot's body control."""
2
+
3
+ import os
4
+ import sys
5
+ import time
6
+ import asyncio
7
+ import logging
8
+ import argparse
9
+ import threading
10
+ from pathlib import Path
11
+ from typing import Optional
12
+
13
+ from dotenv import load_dotenv
14
+ from reachy_mini import ReachyMini, ReachyMiniApp
15
+
16
+ # Load environment from project root (.env next to pyproject.toml)
17
+ _project_root = Path(__file__).parent.parent
18
+ load_dotenv(_project_root / ".env")
19
+
20
+ logger = logging.getLogger(__name__)
21
+
22
+
23
+ def setup_logging(debug: bool = False) -> None:
24
+ """Configure logging."""
25
+ level = logging.DEBUG if debug else logging.INFO
26
+ logging.basicConfig(
27
+ level=level,
28
+ format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
29
+ datefmt="%H:%M:%S",
30
+ )
31
+
32
+
33
+ def parse_args() -> argparse.Namespace:
34
+ """Parse command line arguments."""
35
+ parser = argparse.ArgumentParser(description="Moltbot's body control")
36
+ parser.add_argument("--debug", action="store_true", help="Enable debug logging")
37
+ parser.add_argument("--robot-name", type=str, help="Robot name for connection")
38
+ parser.add_argument(
39
+ "--gateway-url",
40
+ type=str,
41
+ default="http://localhost:18789",
42
+ help="Clawdbot gateway URL",
43
+ )
44
+ parser.add_argument(
45
+ "--profile",
46
+ action="store_true",
47
+ help="Enable timing profiler - prints detailed timing after each turn",
48
+ )
49
+ parser.add_argument(
50
+ "--profile-once",
51
+ action="store_true",
52
+ help="Profile one conversation turn then exit (implies --profile)",
53
+ )
54
+ return parser.parse_args()
55
+
56
+
57
+ class MoltbotBodyCore:
58
+ """Main class controlling Moltbot's physical body."""
59
+
60
+ def __init__(
61
+ self,
62
+ gateway_url: str = "http://localhost:18789",
63
+ robot_name: Optional[str] = None,
64
+ profile_mode: bool = False,
65
+ profile_once: bool = False,
66
+ robot: Optional[ReachyMini] = None,
67
+ external_stop_event: Optional[threading.Event] = None,
68
+ ):
69
+ """Initialize Moltbot's body.
70
+
71
+ Args:
72
+ gateway_url: Clawdbot gateway URL
73
+ robot_name: Optional robot name for connection
74
+ profile_mode: Enable timing profiler
75
+ profile_once: Exit after one conversation turn (implies profile_mode)
76
+ robot: Optional pre-initialized ReachyMini instance (for app framework)
77
+ external_stop_event: Optional external stop event (for app framework)
78
+ """
79
+ from moltbot_body.clawdbot_handler import ClawdbotHandler
80
+ from moltbot_body.moves import MovementManager
81
+ from moltbot_body.audio.head_wobbler import HeadWobbler
82
+
83
+ self.gateway_url = gateway_url
84
+ self.profile_once = profile_once
85
+ self._external_stop_event = external_stop_event
86
+ self._owns_robot = robot is None # Track if we created the robot
87
+
88
+ # Use provided robot or create one
89
+ if robot is not None:
90
+ self.robot = robot
91
+ logger.info("Using provided Reachy Mini instance")
92
+ else:
93
+ # Connect to robot
94
+ logger.info("Connecting to Reachy Mini...")
95
+ robot_kwargs = {}
96
+ if robot_name:
97
+ robot_kwargs["robot_name"] = robot_name
98
+
99
+ try:
100
+ self.robot = ReachyMini(**robot_kwargs)
101
+ except TimeoutError as e:
102
+ logger.error(f"Connection timeout: Failed to connect to Reachy Mini. Details: {e}")
103
+ logger.error("Check that the robot is powered on and reachable on the network.")
104
+ sys.exit(1)
105
+ except ConnectionError as e:
106
+ logger.error(f"Connection failed: Unable to establish connection. Details: {e}")
107
+ sys.exit(1)
108
+ except Exception as e:
109
+ logger.error(f"Unexpected error during robot initialization: {type(e).__name__}: {e}")
110
+ sys.exit(1)
111
+
112
+ logger.info(f"Connected to robot: {self.robot.client.get_status()}")
113
+
114
+ # Initialize movement system
115
+ logger.info("Initializing movement manager...")
116
+ self.movement_manager = MovementManager(current_robot=self.robot)
117
+ self.head_wobbler = HeadWobbler(set_speech_offsets=self.movement_manager.set_speech_offsets)
118
+
119
+ # Initialize handler
120
+ gateway_token = os.getenv("CLAWDBOT_TOKEN")
121
+ if not gateway_token:
122
+ logger.warning("CLAWDBOT_TOKEN not found in environment - auth may fail")
123
+ else:
124
+ logger.debug(f"Gateway token loaded ({len(gateway_token)} chars)")
125
+
126
+ # Callback to handle profile completion
127
+ def on_profile_complete(timing):
128
+ if self.profile_once:
129
+ logger.info("Profile complete - scheduling shutdown...")
130
+ self._stop_event.set()
131
+
132
+ self.handler = ClawdbotHandler(
133
+ gateway_url=gateway_url,
134
+ gateway_token=gateway_token,
135
+ elevenlabs_api_key=os.getenv("ELEVENLABS_API_KEY"),
136
+ head_wobbler=self.head_wobbler,
137
+ on_listening=self._on_listening,
138
+ on_thinking=self._on_thinking,
139
+ on_speaking=self._on_speaking,
140
+ profile_mode=profile_mode or profile_once,
141
+ on_profile_complete=on_profile_complete if profile_once else None,
142
+ )
143
+
144
+ # State
145
+ self._stop_event = asyncio.Event()
146
+ self._tasks: list[asyncio.Task] = []
147
+
148
+ def _on_listening(self) -> None:
149
+ """Callback when listening starts."""
150
+ logger.info("Listening...")
151
+ self.movement_manager.set_listening(True)
152
+
153
+ def _on_thinking(self) -> None:
154
+ """Callback when thinking/processing."""
155
+ logger.info("Thinking...")
156
+ self.movement_manager.set_listening(False)
157
+
158
+ def _on_speaking(self) -> None:
159
+ """Callback when speaking starts."""
160
+ logger.info("Speaking...")
161
+ self.head_wobbler.reset() # Clear any stale audio from previous utterance
162
+
163
+ def _should_stop(self) -> bool:
164
+ """Check if we should stop (internal or external stop event)."""
165
+ if self._stop_event.is_set():
166
+ return True
167
+ if self._external_stop_event is not None and self._external_stop_event.is_set():
168
+ return True
169
+ return False
170
+
171
+ async def record_loop(self) -> None:
172
+ """Read audio from robot microphone and send to handler."""
173
+ input_sample_rate = self.robot.media.get_input_audio_samplerate()
174
+ logger.info(f"Recording at {input_sample_rate} Hz")
175
+
176
+ while not self._should_stop():
177
+ audio_frame = self.robot.media.get_audio_sample()
178
+ if audio_frame is not None:
179
+ await self.handler.receive((input_sample_rate, audio_frame))
180
+ await asyncio.sleep(0.01) # ~100Hz polling
181
+
182
+ async def play_loop(self) -> None:
183
+ """Play audio from handler through robot speakers."""
184
+ output_sample_rate = self.robot.media.get_output_audio_samplerate()
185
+ logger.info(f"Playing at {output_sample_rate} Hz")
186
+
187
+ while not self._should_stop():
188
+ output = await self.handler.emit()
189
+ if output is not None:
190
+ input_sr, audio_data = output
191
+
192
+ # Resample if needed
193
+ if input_sr != output_sample_rate:
194
+ from scipy.signal import resample
195
+ num_samples = int(len(audio_data) * output_sample_rate / input_sr)
196
+ audio_data = resample(audio_data, num_samples).astype("float32")
197
+
198
+ self.robot.media.push_audio_sample(audio_data)
199
+
200
+ await asyncio.sleep(0.01)
201
+
202
+ async def run(self) -> None:
203
+ """Run the main loop."""
204
+ # Start movement system
205
+ logger.info("Starting movement manager...")
206
+ self.movement_manager.start()
207
+ self.head_wobbler.start()
208
+
209
+ # Start media
210
+ logger.info("Starting audio...")
211
+ self.robot.media.start_recording()
212
+ self.robot.media.start_playing()
213
+ time.sleep(1) # Let pipelines initialize
214
+
215
+ logger.info("Ready! Speak to me...")
216
+
217
+ # Start tasks
218
+ self._tasks = [
219
+ asyncio.create_task(self.record_loop(), name="record-loop"),
220
+ asyncio.create_task(self.play_loop(), name="play-loop"),
221
+ ]
222
+
223
+ try:
224
+ await asyncio.gather(*self._tasks)
225
+ except asyncio.CancelledError:
226
+ logger.info("Tasks cancelled")
227
+
228
+ def stop(self) -> None:
229
+ """Stop everything."""
230
+ logger.info("Stopping...")
231
+ self._stop_event.set()
232
+
233
+ # Cancel tasks
234
+ for task in self._tasks:
235
+ if not task.done():
236
+ task.cancel()
237
+
238
+ # Stop movement system (MovementManager resets to neutral on stop)
239
+ self.head_wobbler.stop()
240
+ self.movement_manager.stop()
241
+
242
+ # Only manage robot resources if we created it
243
+ if self._owns_robot:
244
+ # Close media
245
+ try:
246
+ self.robot.media.close()
247
+ except Exception as e:
248
+ logger.debug(f"Error closing media: {e}")
249
+
250
+ # Disconnect
251
+ self.robot.client.disconnect()
252
+
253
+ self.handler.stop()
254
+
255
+ logger.info("Stopped")
256
+
257
+
258
+ class MoltbotBody(ReachyMiniApp):
259
+ """Reachy Mini Apps entry point for Moltbot Body.
260
+
261
+ This class allows Moltbot Body to be installed and run from the
262
+ Reachy Mini dashboard as a Reachy Mini App.
263
+ """
264
+
265
+ # No custom settings UI for now
266
+ custom_app_url: str | None = None
267
+
268
+ def run(self, reachy_mini: ReachyMini, stop_event: threading.Event) -> None:
269
+ """Run Moltbot Body as a Reachy Mini App.
270
+
271
+ Args:
272
+ reachy_mini: Pre-initialized ReachyMini instance from the framework
273
+ stop_event: Threading event to signal when the app should stop
274
+ """
275
+ # Create a new event loop for async operations
276
+ loop = asyncio.new_event_loop()
277
+ asyncio.set_event_loop(loop)
278
+
279
+ # Get gateway URL from environment
280
+ gateway_url = os.getenv("CLAWDBOT_GATEWAY_URL", "http://localhost:18789")
281
+
282
+ # Create the body controller with the provided robot instance
283
+ body = MoltbotBodyCore(
284
+ gateway_url=gateway_url,
285
+ robot=reachy_mini,
286
+ external_stop_event=stop_event,
287
+ )
288
+
289
+ try:
290
+ loop.run_until_complete(body.run())
291
+ except Exception as e:
292
+ logger.error(f"Error running Moltbot Body: {e}")
293
+ finally:
294
+ body.stop()
295
+ loop.close()
296
+
297
+
298
+ def main() -> None:
299
+ """Entry point."""
300
+ args = parse_args()
301
+ setup_logging(args.debug)
302
+
303
+ if args.profile or args.profile_once:
304
+ logger.info("Profiling mode enabled")
305
+
306
+ body = MoltbotBodyCore(
307
+ gateway_url=args.gateway_url,
308
+ robot_name=args.robot_name,
309
+ profile_mode=args.profile,
310
+ profile_once=args.profile_once,
311
+ )
312
+
313
+ try:
314
+ asyncio.run(body.run())
315
+ except KeyboardInterrupt:
316
+ logger.info("Interrupted")
317
+ finally:
318
+ body.stop()
319
+
320
+
321
+ if __name__ == "__main__":
322
+ main()
moltbot_body/moves.py ADDED
@@ -0,0 +1,849 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Movement system with sequential primary moves and additive secondary moves.
2
+
3
+ Design overview
4
+ - Primary moves (emotions, dances, goto, breathing) are mutually exclusive and run
5
+ sequentially.
6
+ - Secondary moves (speech sway, face tracking) are additive offsets applied on top
7
+ of the current primary pose.
8
+ - There is a single control point to the robot: `ReachyMini.set_target`.
9
+ - The control loop runs near 100 Hz and is phase-aligned via a monotonic clock.
10
+ - Idle behaviour starts an infinite `BreathingMove` after a short inactivity delay
11
+ unless listening is active.
12
+
13
+ Threading model
14
+ - A dedicated worker thread owns all real-time state and issues `set_target`
15
+ commands.
16
+ - Other threads communicate via a command queue (enqueue moves, mark activity,
17
+ toggle listening).
18
+ - Secondary offset producers set pending values guarded by locks; the worker
19
+ snaps them atomically.
20
+
21
+ Units and frames
22
+ - Secondary offsets are interpreted as metres for x/y/z and radians for
23
+ roll/pitch/yaw in the world frame (unless noted by `compose_world_offset`).
24
+ - Antennas and `body_yaw` are in radians.
25
+ - Head pose composition uses `compose_world_offset(primary_head, secondary_head)`;
26
+ the secondary offset must therefore be expressed in the world frame.
27
+
28
+ Safety
29
+ - Listening freezes antennas, then blends them back on unfreeze.
30
+ - Interpolations and blends are used to avoid jumps at all times.
31
+ - `set_target` errors are rate-limited in logs.
32
+ """
33
+
34
+ from __future__ import annotations
35
+ import time
36
+ import logging
37
+ import threading
38
+ from queue import Empty, Queue
39
+ from typing import Any, Dict, Tuple
40
+ from collections import deque
41
+ from dataclasses import dataclass
42
+
43
+ import numpy as np
44
+ from numpy.typing import NDArray
45
+
46
+ from reachy_mini import ReachyMini
47
+ from reachy_mini.utils import create_head_pose
48
+ from reachy_mini.motion.move import Move
49
+ from reachy_mini.utils.interpolation import (
50
+ compose_world_offset,
51
+ linear_pose_interpolation,
52
+ )
53
+
54
+
55
+ logger = logging.getLogger(__name__)
56
+
57
+ # Configuration constants
58
+ CONTROL_LOOP_FREQUENCY_HZ = 100.0 # Hz - Target frequency for the movement control loop
59
+
60
+ # Type definitions
61
+ FullBodyPose = Tuple[NDArray[np.float32], Tuple[float, float], float] # (head_pose_4x4, antennas, body_yaw)
62
+
63
+
64
+ class BreathingMove(Move): # type: ignore
65
+ """Breathing move with interpolation to neutral and then continuous breathing patterns."""
66
+
67
+ def __init__(
68
+ self,
69
+ interpolation_start_pose: NDArray[np.float32],
70
+ interpolation_start_antennas: Tuple[float, float],
71
+ interpolation_duration: float = 1.0,
72
+ ):
73
+ """Initialize breathing move.
74
+
75
+ Args:
76
+ interpolation_start_pose: 4x4 matrix of current head pose to interpolate from
77
+ interpolation_start_antennas: Current antenna positions to interpolate from
78
+ interpolation_duration: Duration of interpolation to neutral (seconds)
79
+
80
+ """
81
+ self.interpolation_start_pose = interpolation_start_pose
82
+ self.interpolation_start_antennas = np.array(interpolation_start_antennas)
83
+ self.interpolation_duration = interpolation_duration
84
+
85
+ # Neutral positions for breathing base
86
+ self.neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
87
+ self.neutral_antennas = np.array([0.0, 0.0])
88
+
89
+ # Breathing parameters
90
+ self.breathing_z_amplitude = 0.005 # 5mm gentle breathing
91
+ self.breathing_frequency = 0.1 # Hz (6 breaths per minute)
92
+ self.antenna_sway_amplitude = np.deg2rad(15) # 15 degrees
93
+ self.antenna_frequency = 0.5 # Hz (faster antenna sway)
94
+
95
+ @property
96
+ def duration(self) -> float:
97
+ """Duration property required by official Move interface."""
98
+ return float("inf") # Continuous breathing (never ends naturally)
99
+
100
+ def evaluate(self, t: float) -> tuple[NDArray[np.float64] | None, NDArray[np.float64] | None, float | None]:
101
+ """Evaluate breathing move at time t."""
102
+ if t < self.interpolation_duration:
103
+ # Phase 1: Interpolate to neutral base position
104
+ interpolation_t = t / self.interpolation_duration
105
+
106
+ # Interpolate head pose
107
+ head_pose = linear_pose_interpolation(
108
+ self.interpolation_start_pose, self.neutral_head_pose, interpolation_t,
109
+ )
110
+
111
+ # Interpolate antennas
112
+ antennas_interp = (
113
+ 1 - interpolation_t
114
+ ) * self.interpolation_start_antennas + interpolation_t * self.neutral_antennas
115
+ antennas = antennas_interp.astype(np.float64)
116
+
117
+ else:
118
+ # Phase 2: Breathing patterns from neutral base
119
+ breathing_time = t - self.interpolation_duration
120
+
121
+ # Gentle z-axis breathing
122
+ z_offset = self.breathing_z_amplitude * np.sin(2 * np.pi * self.breathing_frequency * breathing_time)
123
+ head_pose = create_head_pose(x=0, y=0, z=z_offset, roll=0, pitch=0, yaw=0, degrees=True, mm=False)
124
+
125
+ # Antenna sway (opposite directions)
126
+ antenna_sway = self.antenna_sway_amplitude * np.sin(2 * np.pi * self.antenna_frequency * breathing_time)
127
+ antennas = np.array([antenna_sway, -antenna_sway], dtype=np.float64)
128
+
129
+ # Return in official Move interface format: (head_pose, antennas_array, body_yaw)
130
+ return (head_pose, antennas, 0.0)
131
+
132
+
133
+ def combine_full_body(primary_pose: FullBodyPose, secondary_pose: FullBodyPose) -> FullBodyPose:
134
+ """Combine primary and secondary full body poses.
135
+
136
+ Args:
137
+ primary_pose: (head_pose, antennas, body_yaw) - primary move
138
+ secondary_pose: (head_pose, antennas, body_yaw) - secondary offsets
139
+
140
+ Returns:
141
+ Combined full body pose (head_pose, antennas, body_yaw)
142
+
143
+ """
144
+ primary_head, primary_antennas, primary_body_yaw = primary_pose
145
+ secondary_head, secondary_antennas, secondary_body_yaw = secondary_pose
146
+
147
+ # Combine head poses using compose_world_offset; the secondary pose must be an
148
+ # offset expressed in the world frame (T_off_world) applied to the absolute
149
+ # primary transform (T_abs).
150
+ combined_head = compose_world_offset(primary_head, secondary_head, reorthonormalize=True)
151
+
152
+ # Sum antennas and body_yaw
153
+ combined_antennas = (
154
+ primary_antennas[0] + secondary_antennas[0],
155
+ primary_antennas[1] + secondary_antennas[1],
156
+ )
157
+ combined_body_yaw = primary_body_yaw + secondary_body_yaw
158
+
159
+ return (combined_head, combined_antennas, combined_body_yaw)
160
+
161
+
162
+ def clone_full_body_pose(pose: FullBodyPose) -> FullBodyPose:
163
+ """Create a deep copy of a full body pose tuple."""
164
+ head, antennas, body_yaw = pose
165
+ return (head.copy(), (float(antennas[0]), float(antennas[1])), float(body_yaw))
166
+
167
+
168
+ @dataclass
169
+ class MovementState:
170
+ """State tracking for the movement system."""
171
+
172
+ # Primary move state
173
+ current_move: Move | None = None
174
+ move_start_time: float | None = None
175
+ last_activity_time: float = 0.0
176
+
177
+ # Secondary move state (offsets)
178
+ speech_offsets: Tuple[float, float, float, float, float, float] = (
179
+ 0.0,
180
+ 0.0,
181
+ 0.0,
182
+ 0.0,
183
+ 0.0,
184
+ 0.0,
185
+ )
186
+ face_tracking_offsets: Tuple[float, float, float, float, float, float] = (
187
+ 0.0,
188
+ 0.0,
189
+ 0.0,
190
+ 0.0,
191
+ 0.0,
192
+ 0.0,
193
+ )
194
+
195
+ # Status flags
196
+ last_primary_pose: FullBodyPose | None = None
197
+
198
+ def update_activity(self) -> None:
199
+ """Update the last activity time."""
200
+ self.last_activity_time = time.monotonic()
201
+
202
+
203
+ @dataclass
204
+ class LoopFrequencyStats:
205
+ """Track rolling loop frequency statistics."""
206
+
207
+ mean: float = 0.0
208
+ m2: float = 0.0
209
+ min_freq: float = float("inf")
210
+ count: int = 0
211
+ last_freq: float = 0.0
212
+ potential_freq: float = 0.0
213
+
214
+ def reset(self) -> None:
215
+ """Reset accumulators while keeping the last potential frequency."""
216
+ self.mean = 0.0
217
+ self.m2 = 0.0
218
+ self.min_freq = float("inf")
219
+ self.count = 0
220
+
221
+
222
+ class MovementManager:
223
+ """Coordinate sequential moves, additive offsets, and robot output at 100 Hz.
224
+
225
+ Responsibilities:
226
+ - Own a real-time loop that samples the current primary move (if any), fuses
227
+ secondary offsets, and calls `set_target` exactly once per tick.
228
+ - Start an idle `BreathingMove` after `idle_inactivity_delay` when not
229
+ listening and no moves are queued.
230
+ - Expose thread-safe APIs so other threads can enqueue moves, mark activity,
231
+ or feed secondary offsets without touching internal state.
232
+
233
+ Timing:
234
+ - All elapsed-time calculations rely on `time.monotonic()` through `self._now`
235
+ to avoid wall-clock jumps.
236
+ - The loop attempts 100 Hz
237
+
238
+ Concurrency:
239
+ - External threads communicate via `_command_queue` messages.
240
+ - Secondary offsets are staged via dirty flags guarded by locks and consumed
241
+ atomically inside the worker loop.
242
+ """
243
+
244
+ def __init__(
245
+ self,
246
+ current_robot: ReachyMini,
247
+ camera_worker: "Any" = None,
248
+ ):
249
+ """Initialize movement manager."""
250
+ self.current_robot = current_robot
251
+ self.camera_worker = camera_worker
252
+
253
+ # Single timing source for durations
254
+ self._now = time.monotonic
255
+
256
+ # Movement state
257
+ self.state = MovementState()
258
+ self.state.last_activity_time = self._now()
259
+ neutral_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
260
+ self.state.last_primary_pose = (neutral_pose, (0.0, 0.0), 0.0)
261
+
262
+ # Move queue (primary moves)
263
+ self.move_queue: deque[Move] = deque()
264
+
265
+ # Configuration
266
+ self.idle_inactivity_delay = 0.3 # seconds
267
+ self.target_frequency = CONTROL_LOOP_FREQUENCY_HZ
268
+ self.target_period = 1.0 / self.target_frequency
269
+
270
+ self._stop_event = threading.Event()
271
+ self._thread: threading.Thread | None = None
272
+ self._is_listening = False
273
+ self._last_commanded_pose: FullBodyPose = clone_full_body_pose(self.state.last_primary_pose)
274
+ self._listening_antennas: Tuple[float, float] = self._last_commanded_pose[1]
275
+ self._antenna_unfreeze_blend = 1.0
276
+ self._antenna_blend_duration = 0.4 # seconds to blend back after listening
277
+ self._last_listening_blend_time = self._now()
278
+ self._breathing_active = False # true when breathing move is running or queued
279
+ self._listening_debounce_s = 0.15
280
+ self._last_listening_toggle_time = self._now()
281
+ self._last_set_target_err = 0.0
282
+ self._set_target_err_interval = 1.0 # seconds between error logs
283
+ self._set_target_err_suppressed = 0
284
+
285
+ # Cross-thread signalling
286
+ self._command_queue: "Queue[Tuple[str, Any]]" = Queue()
287
+ self._speech_offsets_lock = threading.Lock()
288
+ self._pending_speech_offsets: Tuple[float, float, float, float, float, float] = (
289
+ 0.0,
290
+ 0.0,
291
+ 0.0,
292
+ 0.0,
293
+ 0.0,
294
+ 0.0,
295
+ )
296
+ self._speech_offsets_dirty = False
297
+
298
+ self._face_offsets_lock = threading.Lock()
299
+ self._pending_face_offsets: Tuple[float, float, float, float, float, float] = (
300
+ 0.0,
301
+ 0.0,
302
+ 0.0,
303
+ 0.0,
304
+ 0.0,
305
+ 0.0,
306
+ )
307
+ self._face_offsets_dirty = False
308
+
309
+ self._shared_state_lock = threading.Lock()
310
+ self._shared_last_activity_time = self.state.last_activity_time
311
+ self._shared_is_listening = self._is_listening
312
+ self._status_lock = threading.Lock()
313
+ self._freq_stats = LoopFrequencyStats()
314
+ self._freq_snapshot = LoopFrequencyStats()
315
+
316
+ def queue_move(self, move: Move) -> None:
317
+ """Queue a primary move to run after the currently executing one.
318
+
319
+ Thread-safe: the move is enqueued via the worker command queue so the
320
+ control loop remains the sole mutator of movement state.
321
+ """
322
+ self._command_queue.put(("queue_move", move))
323
+
324
+ def clear_move_queue(self) -> None:
325
+ """Stop the active move and discard any queued primary moves.
326
+
327
+ Thread-safe: executed by the worker thread via the command queue.
328
+ """
329
+ self._command_queue.put(("clear_queue", None))
330
+
331
+ def set_speech_offsets(self, offsets: Tuple[float, float, float, float, float, float]) -> None:
332
+ """Update speech-induced secondary offsets (x, y, z, roll, pitch, yaw).
333
+
334
+ Offsets are interpreted as metres for translation and radians for
335
+ rotation in the world frame. Thread-safe via a pending snapshot.
336
+ """
337
+ with self._speech_offsets_lock:
338
+ self._pending_speech_offsets = offsets
339
+ self._speech_offsets_dirty = True
340
+
341
+ def set_moving_state(self, duration: float) -> None:
342
+ """Mark the robot as actively moving for the provided duration.
343
+
344
+ Legacy hook used by goto helpers to keep inactivity and breathing logic
345
+ aware of manual motions. Thread-safe via the command queue.
346
+ """
347
+ self._command_queue.put(("set_moving_state", duration))
348
+
349
+ def is_idle(self) -> bool:
350
+ """Return True when the robot has been inactive longer than the idle delay."""
351
+ with self._shared_state_lock:
352
+ last_activity = self._shared_last_activity_time
353
+ listening = self._shared_is_listening
354
+
355
+ if listening:
356
+ return False
357
+
358
+ return self._now() - last_activity >= self.idle_inactivity_delay
359
+
360
+ def set_listening(self, listening: bool) -> None:
361
+ """Enable or disable listening mode without touching shared state directly.
362
+
363
+ While listening:
364
+ - Antenna positions are frozen at the last commanded values.
365
+ - Blending is reset so that upon unfreezing the antennas return smoothly.
366
+ - Idle breathing is suppressed.
367
+
368
+ Thread-safe: the change is posted to the worker command queue.
369
+ """
370
+ with self._shared_state_lock:
371
+ if self._shared_is_listening == listening:
372
+ return
373
+ self._command_queue.put(("set_listening", listening))
374
+
375
+ def _poll_signals(self, current_time: float) -> None:
376
+ """Apply queued commands and pending offset updates."""
377
+ self._apply_pending_offsets()
378
+
379
+ while True:
380
+ try:
381
+ command, payload = self._command_queue.get_nowait()
382
+ except Empty:
383
+ break
384
+ self._handle_command(command, payload, current_time)
385
+
386
+ def _apply_pending_offsets(self) -> None:
387
+ """Apply the most recent speech/face offset updates."""
388
+ speech_offsets: Tuple[float, float, float, float, float, float] | None = None
389
+ with self._speech_offsets_lock:
390
+ if self._speech_offsets_dirty:
391
+ speech_offsets = self._pending_speech_offsets
392
+ self._speech_offsets_dirty = False
393
+
394
+ if speech_offsets is not None:
395
+ self.state.speech_offsets = speech_offsets
396
+ self.state.update_activity()
397
+
398
+ face_offsets: Tuple[float, float, float, float, float, float] | None = None
399
+ with self._face_offsets_lock:
400
+ if self._face_offsets_dirty:
401
+ face_offsets = self._pending_face_offsets
402
+ self._face_offsets_dirty = False
403
+
404
+ if face_offsets is not None:
405
+ self.state.face_tracking_offsets = face_offsets
406
+ self.state.update_activity()
407
+
408
+ def _handle_command(self, command: str, payload: Any, current_time: float) -> None:
409
+ """Handle a single cross-thread command."""
410
+ if command == "queue_move":
411
+ if isinstance(payload, Move):
412
+ self.move_queue.append(payload)
413
+ self.state.update_activity()
414
+ duration = getattr(payload, "duration", None)
415
+ if duration is not None:
416
+ try:
417
+ duration_str = f"{float(duration):.2f}"
418
+ except (TypeError, ValueError):
419
+ duration_str = str(duration)
420
+ else:
421
+ duration_str = "?"
422
+ logger.debug(
423
+ "Queued move with duration %ss, queue size: %s",
424
+ duration_str,
425
+ len(self.move_queue),
426
+ )
427
+ else:
428
+ logger.warning("Ignored queue_move command with invalid payload: %s", payload)
429
+ elif command == "clear_queue":
430
+ self.move_queue.clear()
431
+ self.state.current_move = None
432
+ self.state.move_start_time = None
433
+ self._breathing_active = False
434
+ logger.info("Cleared move queue and stopped current move")
435
+ elif command == "set_moving_state":
436
+ try:
437
+ duration = float(payload)
438
+ except (TypeError, ValueError):
439
+ logger.warning("Invalid moving state duration: %s", payload)
440
+ return
441
+ self.state.update_activity()
442
+ elif command == "mark_activity":
443
+ self.state.update_activity()
444
+ elif command == "set_listening":
445
+ desired_state = bool(payload)
446
+ now = self._now()
447
+ if now - self._last_listening_toggle_time < self._listening_debounce_s:
448
+ return
449
+ self._last_listening_toggle_time = now
450
+
451
+ if self._is_listening == desired_state:
452
+ return
453
+
454
+ self._is_listening = desired_state
455
+ self._last_listening_blend_time = now
456
+ if desired_state:
457
+ # Freeze: snapshot current commanded antennas and reset blend
458
+ self._listening_antennas = (
459
+ float(self._last_commanded_pose[1][0]),
460
+ float(self._last_commanded_pose[1][1]),
461
+ )
462
+ self._antenna_unfreeze_blend = 0.0
463
+ else:
464
+ # Unfreeze: restart blending from frozen pose
465
+ self._antenna_unfreeze_blend = 0.0
466
+ self.state.update_activity()
467
+ else:
468
+ logger.warning("Unknown command received by MovementManager: %s", command)
469
+
470
+ def _publish_shared_state(self) -> None:
471
+ """Expose idle-related state for external threads."""
472
+ with self._shared_state_lock:
473
+ self._shared_last_activity_time = self.state.last_activity_time
474
+ self._shared_is_listening = self._is_listening
475
+
476
+ def _manage_move_queue(self, current_time: float) -> None:
477
+ """Manage the primary move queue (sequential execution)."""
478
+ if self.state.current_move is None or (
479
+ self.state.move_start_time is not None
480
+ and current_time - self.state.move_start_time >= self.state.current_move.duration
481
+ ):
482
+ self.state.current_move = None
483
+ self.state.move_start_time = None
484
+
485
+ if self.move_queue:
486
+ self.state.current_move = self.move_queue.popleft()
487
+ self.state.move_start_time = current_time
488
+ # Any real move cancels breathing mode flag
489
+ self._breathing_active = isinstance(self.state.current_move, BreathingMove)
490
+ logger.debug(f"Starting new move, duration: {self.state.current_move.duration}s")
491
+
492
+ def _manage_breathing(self, current_time: float) -> None:
493
+ """Manage automatic breathing when idle."""
494
+ if (
495
+ self.state.current_move is None
496
+ and not self.move_queue
497
+ and not self._is_listening
498
+ and not self._breathing_active
499
+ ):
500
+ idle_for = current_time - self.state.last_activity_time
501
+ if idle_for >= self.idle_inactivity_delay:
502
+ try:
503
+ # These 2 functions return the latest available sensor data from the robot, but don't perform I/O synchronously.
504
+ # Therefore, we accept calling them inside the control loop.
505
+ _, current_antennas = self.current_robot.get_current_joint_positions()
506
+ current_head_pose = self.current_robot.get_current_head_pose()
507
+
508
+ self._breathing_active = True
509
+ self.state.update_activity()
510
+
511
+ breathing_move = BreathingMove(
512
+ interpolation_start_pose=current_head_pose,
513
+ interpolation_start_antennas=current_antennas,
514
+ interpolation_duration=1.0,
515
+ )
516
+ self.move_queue.append(breathing_move)
517
+ logger.debug("Started breathing after %.1fs of inactivity", idle_for)
518
+ except Exception as e:
519
+ self._breathing_active = False
520
+ logger.error("Failed to start breathing: %s", e)
521
+
522
+ if isinstance(self.state.current_move, BreathingMove) and self.move_queue:
523
+ self.state.current_move = None
524
+ self.state.move_start_time = None
525
+ self._breathing_active = False
526
+ logger.debug("Stopping breathing due to new move activity")
527
+
528
+ if self.state.current_move is not None and not isinstance(self.state.current_move, BreathingMove):
529
+ self._breathing_active = False
530
+
531
+ def _get_primary_pose(self, current_time: float) -> FullBodyPose:
532
+ """Get the primary full body pose from current move or neutral."""
533
+ # When a primary move is playing, sample it and cache the resulting pose
534
+ if self.state.current_move is not None and self.state.move_start_time is not None:
535
+ move_time = current_time - self.state.move_start_time
536
+ head, antennas, body_yaw = self.state.current_move.evaluate(move_time)
537
+
538
+ if head is None:
539
+ head = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
540
+ if antennas is None:
541
+ antennas = np.array([0.0, 0.0])
542
+ if body_yaw is None:
543
+ body_yaw = 0.0
544
+
545
+ antennas_tuple = (float(antennas[0]), float(antennas[1]))
546
+ head_copy = head.copy()
547
+ primary_full_body_pose = (
548
+ head_copy,
549
+ antennas_tuple,
550
+ float(body_yaw),
551
+ )
552
+
553
+ self.state.last_primary_pose = clone_full_body_pose(primary_full_body_pose)
554
+ # Otherwise reuse the last primary pose so we avoid jumps between moves
555
+ elif self.state.last_primary_pose is not None:
556
+ primary_full_body_pose = clone_full_body_pose(self.state.last_primary_pose)
557
+ else:
558
+ neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
559
+ primary_full_body_pose = (neutral_head_pose, (0.0, 0.0), 0.0)
560
+ self.state.last_primary_pose = clone_full_body_pose(primary_full_body_pose)
561
+
562
+ return primary_full_body_pose
563
+
564
+ def _get_secondary_pose(self) -> FullBodyPose:
565
+ """Get the secondary full body pose from speech and face tracking offsets."""
566
+ # Combine speech sway offsets + face tracking offsets for secondary pose
567
+ secondary_offsets = [
568
+ self.state.speech_offsets[0] + self.state.face_tracking_offsets[0],
569
+ self.state.speech_offsets[1] + self.state.face_tracking_offsets[1],
570
+ self.state.speech_offsets[2] + self.state.face_tracking_offsets[2],
571
+ self.state.speech_offsets[3] + self.state.face_tracking_offsets[3],
572
+ self.state.speech_offsets[4] + self.state.face_tracking_offsets[4],
573
+ self.state.speech_offsets[5] + self.state.face_tracking_offsets[5],
574
+ ]
575
+
576
+ secondary_head_pose = create_head_pose(
577
+ x=secondary_offsets[0],
578
+ y=secondary_offsets[1],
579
+ z=secondary_offsets[2],
580
+ roll=secondary_offsets[3],
581
+ pitch=secondary_offsets[4],
582
+ yaw=secondary_offsets[5],
583
+ degrees=False,
584
+ mm=False,
585
+ )
586
+ return (secondary_head_pose, (0.0, 0.0), 0.0)
587
+
588
+ def _compose_full_body_pose(self, current_time: float) -> FullBodyPose:
589
+ """Compose primary and secondary poses into a single command pose."""
590
+ primary = self._get_primary_pose(current_time)
591
+ secondary = self._get_secondary_pose()
592
+ return combine_full_body(primary, secondary)
593
+
594
+ def _update_primary_motion(self, current_time: float) -> None:
595
+ """Advance queue state and idle behaviours for this tick."""
596
+ self._manage_move_queue(current_time)
597
+ self._manage_breathing(current_time)
598
+
599
+ def _calculate_blended_antennas(self, target_antennas: Tuple[float, float]) -> Tuple[float, float]:
600
+ """Blend target antennas with listening freeze state and update blending."""
601
+ now = self._now()
602
+ listening = self._is_listening
603
+ listening_antennas = self._listening_antennas
604
+ blend = self._antenna_unfreeze_blend
605
+ blend_duration = self._antenna_blend_duration
606
+ last_update = self._last_listening_blend_time
607
+ self._last_listening_blend_time = now
608
+
609
+ if listening:
610
+ antennas_cmd = listening_antennas
611
+ new_blend = 0.0
612
+ else:
613
+ dt = max(0.0, now - last_update)
614
+ if blend_duration <= 0:
615
+ new_blend = 1.0
616
+ else:
617
+ new_blend = min(1.0, blend + dt / blend_duration)
618
+ antennas_cmd = (
619
+ listening_antennas[0] * (1.0 - new_blend) + target_antennas[0] * new_blend,
620
+ listening_antennas[1] * (1.0 - new_blend) + target_antennas[1] * new_blend,
621
+ )
622
+
623
+ if listening:
624
+ self._antenna_unfreeze_blend = 0.0
625
+ else:
626
+ self._antenna_unfreeze_blend = new_blend
627
+ if new_blend >= 1.0:
628
+ self._listening_antennas = (
629
+ float(target_antennas[0]),
630
+ float(target_antennas[1]),
631
+ )
632
+
633
+ return antennas_cmd
634
+
635
+ def _issue_control_command(self, head: NDArray[np.float32], antennas: Tuple[float, float], body_yaw: float) -> None:
636
+ """Send the fused pose to the robot with throttled error logging."""
637
+ try:
638
+ self.current_robot.set_target(head=head, antennas=antennas, body_yaw=body_yaw)
639
+ except Exception as e:
640
+ now = self._now()
641
+ if now - self._last_set_target_err >= self._set_target_err_interval:
642
+ msg = f"Failed to set robot target: {e}"
643
+ if self._set_target_err_suppressed:
644
+ msg += f" (suppressed {self._set_target_err_suppressed} repeats)"
645
+ self._set_target_err_suppressed = 0
646
+ logger.error(msg)
647
+ self._last_set_target_err = now
648
+ else:
649
+ self._set_target_err_suppressed += 1
650
+ else:
651
+ with self._status_lock:
652
+ self._last_commanded_pose = clone_full_body_pose((head, antennas, body_yaw))
653
+
654
+ def _update_frequency_stats(
655
+ self, loop_start: float, prev_loop_start: float, stats: LoopFrequencyStats,
656
+ ) -> LoopFrequencyStats:
657
+ """Update frequency statistics based on the current loop start time."""
658
+ period = loop_start - prev_loop_start
659
+ if period > 0:
660
+ stats.last_freq = 1.0 / period
661
+ stats.count += 1
662
+ delta = stats.last_freq - stats.mean
663
+ stats.mean += delta / stats.count
664
+ stats.m2 += delta * (stats.last_freq - stats.mean)
665
+ stats.min_freq = min(stats.min_freq, stats.last_freq)
666
+ return stats
667
+
668
+ def _schedule_next_tick(self, loop_start: float, stats: LoopFrequencyStats) -> Tuple[float, LoopFrequencyStats]:
669
+ """Compute sleep time to maintain target frequency and update potential freq."""
670
+ computation_time = self._now() - loop_start
671
+ stats.potential_freq = 1.0 / computation_time if computation_time > 0 else float("inf")
672
+ sleep_time = max(0.0, self.target_period - computation_time)
673
+ return sleep_time, stats
674
+
675
+ def _record_frequency_snapshot(self, stats: LoopFrequencyStats) -> None:
676
+ """Store a thread-safe snapshot of current frequency statistics."""
677
+ with self._status_lock:
678
+ self._freq_snapshot = LoopFrequencyStats(
679
+ mean=stats.mean,
680
+ m2=stats.m2,
681
+ min_freq=stats.min_freq,
682
+ count=stats.count,
683
+ last_freq=stats.last_freq,
684
+ potential_freq=stats.potential_freq,
685
+ )
686
+
687
+ def _maybe_log_frequency(self, loop_count: int, print_interval_loops: int, stats: LoopFrequencyStats) -> None:
688
+ """Emit frequency telemetry when enough loops have elapsed."""
689
+ if loop_count % print_interval_loops != 0 or stats.count == 0:
690
+ return
691
+
692
+ variance = stats.m2 / stats.count if stats.count > 0 else 0.0
693
+ lowest = stats.min_freq if stats.min_freq != float("inf") else 0.0
694
+ logger.debug(
695
+ "Loop freq - avg: %.2fHz, variance: %.4f, min: %.2fHz, last: %.2fHz, potential: %.2fHz, target: %.1fHz",
696
+ stats.mean,
697
+ variance,
698
+ lowest,
699
+ stats.last_freq,
700
+ stats.potential_freq,
701
+ self.target_frequency,
702
+ )
703
+ stats.reset()
704
+
705
+ def _update_face_tracking(self, current_time: float) -> None:
706
+ """Get face tracking offsets from camera worker thread."""
707
+ if self.camera_worker is not None:
708
+ # Get face tracking offsets from camera worker thread
709
+ offsets = self.camera_worker.get_face_tracking_offsets()
710
+ self.state.face_tracking_offsets = offsets
711
+ else:
712
+ # No camera worker, use neutral offsets
713
+ self.state.face_tracking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
714
+
715
+ def start(self) -> None:
716
+ """Start the worker thread that drives the 100 Hz control loop."""
717
+ if self._thread is not None and self._thread.is_alive():
718
+ logger.warning("Move worker already running; start() ignored")
719
+ return
720
+ self._stop_event.clear()
721
+ self._thread = threading.Thread(target=self.working_loop, daemon=True)
722
+ self._thread.start()
723
+ logger.debug("Move worker started")
724
+
725
+ def stop(self) -> None:
726
+ """Request the worker thread to stop and wait for it to exit.
727
+
728
+ Before stopping, resets the robot to a neutral position.
729
+ """
730
+ if self._thread is None or not self._thread.is_alive():
731
+ logger.debug("Move worker not running; stop() ignored")
732
+ return
733
+
734
+ logger.info("Stopping movement manager and resetting to neutral position...")
735
+
736
+ # Clear any queued moves and stop current move
737
+ self.clear_move_queue()
738
+
739
+ # Stop the worker thread first so it doesn't interfere
740
+ self._stop_event.set()
741
+ if self._thread is not None:
742
+ self._thread.join()
743
+ self._thread = None
744
+ logger.debug("Move worker stopped")
745
+
746
+ # Reset to neutral position using goto_target (same approach as wake_up)
747
+ try:
748
+ neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
749
+ neutral_antennas = [0.0, 0.0]
750
+ neutral_body_yaw = 0.0
751
+
752
+ # Use goto_target directly on the robot
753
+ self.current_robot.goto_target(
754
+ head=neutral_head_pose,
755
+ antennas=neutral_antennas,
756
+ duration=2.0,
757
+ body_yaw=neutral_body_yaw,
758
+ )
759
+
760
+ logger.info("Reset to neutral position completed")
761
+
762
+ except Exception as e:
763
+ logger.error(f"Failed to reset to neutral position: {e}")
764
+
765
+ def get_status(self) -> Dict[str, Any]:
766
+ """Return a lightweight status snapshot for observability."""
767
+ with self._status_lock:
768
+ pose_snapshot = clone_full_body_pose(self._last_commanded_pose)
769
+ freq_snapshot = LoopFrequencyStats(
770
+ mean=self._freq_snapshot.mean,
771
+ m2=self._freq_snapshot.m2,
772
+ min_freq=self._freq_snapshot.min_freq,
773
+ count=self._freq_snapshot.count,
774
+ last_freq=self._freq_snapshot.last_freq,
775
+ potential_freq=self._freq_snapshot.potential_freq,
776
+ )
777
+
778
+ head_matrix = pose_snapshot[0].tolist() if pose_snapshot else None
779
+ antennas = pose_snapshot[1] if pose_snapshot else None
780
+ body_yaw = pose_snapshot[2] if pose_snapshot else None
781
+
782
+ return {
783
+ "queue_size": len(self.move_queue),
784
+ "is_listening": self._is_listening,
785
+ "breathing_active": self._breathing_active,
786
+ "last_commanded_pose": {
787
+ "head": head_matrix,
788
+ "antennas": antennas,
789
+ "body_yaw": body_yaw,
790
+ },
791
+ "loop_frequency": {
792
+ "last": freq_snapshot.last_freq,
793
+ "mean": freq_snapshot.mean,
794
+ "min": freq_snapshot.min_freq,
795
+ "potential": freq_snapshot.potential_freq,
796
+ "samples": freq_snapshot.count,
797
+ },
798
+ }
799
+
800
+ def working_loop(self) -> None:
801
+ """Control loop main movements - reproduces main_works.py control architecture.
802
+
803
+ Single set_target() call with pose fusion.
804
+ """
805
+ logger.debug("Starting enhanced movement control loop (100Hz)")
806
+
807
+ loop_count = 0
808
+ prev_loop_start = self._now()
809
+ print_interval_loops = max(1, int(self.target_frequency * 2))
810
+ freq_stats = self._freq_stats
811
+
812
+ while not self._stop_event.is_set():
813
+ loop_start = self._now()
814
+ loop_count += 1
815
+
816
+ if loop_count > 1:
817
+ freq_stats = self._update_frequency_stats(loop_start, prev_loop_start, freq_stats)
818
+ prev_loop_start = loop_start
819
+
820
+ # 1) Poll external commands and apply pending offsets (atomic snapshot)
821
+ self._poll_signals(loop_start)
822
+
823
+ # 2) Manage the primary move queue (start new move, end finished move, breathing)
824
+ self._update_primary_motion(loop_start)
825
+
826
+ # 3) Update vision-based secondary offsets
827
+ self._update_face_tracking(loop_start)
828
+
829
+ # 4) Build primary and secondary full-body poses, then fuse them
830
+ head, antennas, body_yaw = self._compose_full_body_pose(loop_start)
831
+
832
+ # 5) Apply listening antenna freeze or blend-back
833
+ antennas_cmd = self._calculate_blended_antennas(antennas)
834
+
835
+ # 6) Single set_target call - the only control point
836
+ self._issue_control_command(head, antennas_cmd, body_yaw)
837
+
838
+ # 7) Adaptive sleep to align to next tick, then publish shared state
839
+ sleep_time, freq_stats = self._schedule_next_tick(loop_start, freq_stats)
840
+ self._publish_shared_state()
841
+ self._record_frequency_snapshot(freq_stats)
842
+
843
+ # 8) Periodic telemetry on loop frequency
844
+ self._maybe_log_frequency(loop_count, print_interval_loops, freq_stats)
845
+
846
+ if sleep_time > 0:
847
+ time.sleep(sleep_time)
848
+
849
+ logger.debug("Movement control loop stopped")
pyproject.toml ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "moltbot-body"
7
+ version = "0.1.0"
8
+ description = "Motlbot's physical body - Reachy Mini integration with Clawdbot"
9
+ readme = "README.md"
10
+ requires-python = ">=3.12"
11
+ dependencies = [
12
+ # Reachy Mini SDK
13
+ "reachy-mini>=1.2.13",
14
+ "reachy_mini_dances_library",
15
+ "reachy_mini_toolbox",
16
+
17
+ # Audio
18
+ "numpy",
19
+ "scipy",
20
+ "soundfile",
21
+
22
+ # Whisper STT (faster-whisper uses CTranslate2, no numba dependency)
23
+ "faster-whisper",
24
+
25
+ # HTTP client for Clawdbot gateway
26
+ "httpx",
27
+ "httpx-sse>=0.4.0",
28
+
29
+ # WebSocket for streaming TTS
30
+ "websockets>=12.0",
31
+
32
+ # Environment
33
+ "python-dotenv",
34
+ ]
35
+
36
+ [project.optional-dependencies]
37
+ dev = [
38
+ "pytest",
39
+ "ruff",
40
+ ]
41
+
42
+ [project.scripts]
43
+ moltbot-body = "moltbot_body.main:main"
44
+
45
+ [project.entry-points."reachy_mini_apps"]
46
+ moltbot-body = "moltbot_body.main:MoltbotBody"
47
+
48
+ [tool.setuptools.packages.find]
49
+ where = ["."]
style.css ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ :root {
2
+ --bg: #060c1d;
3
+ --panel: #0c172b;
4
+ --glass: rgba(17, 27, 48, 0.7);
5
+ --card: rgba(255, 255, 255, 0.04);
6
+ --accent: #7af5c4;
7
+ --accent-2: #f6c452;
8
+ --text: #e8edf7;
9
+ --muted: #9fb3ce;
10
+ --border: rgba(255, 255, 255, 0.08);
11
+ --shadow: 0 25px 70px rgba(0, 0, 0, 0.45);
12
+ font-family: "Space Grotesk", "Manrope", system-ui, -apple-system, sans-serif;
13
+ }
14
+
15
+ * {
16
+ margin: 0;
17
+ padding: 0;
18
+ box-sizing: border-box;
19
+ }
20
+
21
+ body {
22
+ background: radial-gradient(circle at 20% 20%, rgba(122, 245, 196, 0.12), transparent 30%),
23
+ radial-gradient(circle at 80% 0%, rgba(246, 196, 82, 0.14), transparent 32%),
24
+ radial-gradient(circle at 50% 70%, rgba(124, 142, 255, 0.1), transparent 30%),
25
+ var(--bg);
26
+ color: var(--text);
27
+ min-height: 100vh;
28
+ line-height: 1.6;
29
+ padding-bottom: 3rem;
30
+ }
31
+
32
+ a {
33
+ color: inherit;
34
+ text-decoration: none;
35
+ }
36
+
37
+ .hero {
38
+ padding: 3.5rem clamp(1.5rem, 3vw, 3rem) 2.5rem;
39
+ position: relative;
40
+ overflow: hidden;
41
+ }
42
+
43
+ .hero::after {
44
+ content: "";
45
+ position: absolute;
46
+ inset: 0;
47
+ background: linear-gradient(120deg, rgba(122, 245, 196, 0.12), rgba(246, 196, 82, 0.08), transparent);
48
+ pointer-events: none;
49
+ }
50
+
51
+ .topline {
52
+ display: flex;
53
+ align-items: center;
54
+ justify-content: space-between;
55
+ max-width: 1200px;
56
+ margin: 0 auto 2rem;
57
+ position: relative;
58
+ z-index: 2;
59
+ }
60
+
61
+ .brand {
62
+ display: flex;
63
+ align-items: center;
64
+ gap: 0.5rem;
65
+ font-weight: 700;
66
+ letter-spacing: 0.5px;
67
+ color: var(--text);
68
+ }
69
+
70
+ .logo {
71
+ display: inline-flex;
72
+ align-items: center;
73
+ justify-content: center;
74
+ width: 2.2rem;
75
+ height: 2.2rem;
76
+ border-radius: 10px;
77
+ background: linear-gradient(145deg, rgba(122, 245, 196, 0.15), rgba(124, 142, 255, 0.15));
78
+ box-shadow: 0 10px 30px rgba(0, 0, 0, 0.25);
79
+ }
80
+
81
+ .brand-name {
82
+ font-size: 1.1rem;
83
+ }
84
+
85
+ .pill {
86
+ background: rgba(255, 255, 255, 0.06);
87
+ border: 1px solid var(--border);
88
+ padding: 0.6rem 1rem;
89
+ border-radius: 999px;
90
+ color: var(--muted);
91
+ font-size: 0.9rem;
92
+ box-shadow: 0 12px 30px rgba(0, 0, 0, 0.2);
93
+ }
94
+
95
+ .hero-grid {
96
+ display: grid;
97
+ grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
98
+ gap: clamp(1.5rem, 2.5vw, 2.5rem);
99
+ max-width: 1200px;
100
+ margin: 0 auto;
101
+ position: relative;
102
+ z-index: 2;
103
+ align-items: center;
104
+ }
105
+
106
+ .hero-copy h1 {
107
+ font-size: clamp(2.6rem, 4vw, 3.6rem);
108
+ margin-bottom: 1rem;
109
+ line-height: 1.1;
110
+ letter-spacing: -0.5px;
111
+ }
112
+
113
+ .eyebrow {
114
+ display: inline-flex;
115
+ align-items: center;
116
+ gap: 0.5rem;
117
+ text-transform: uppercase;
118
+ letter-spacing: 1px;
119
+ font-size: 0.8rem;
120
+ color: var(--muted);
121
+ margin-bottom: 0.75rem;
122
+ }
123
+
124
+ .eyebrow::before {
125
+ content: "";
126
+ display: inline-block;
127
+ width: 24px;
128
+ height: 2px;
129
+ background: linear-gradient(90deg, var(--accent), var(--accent-2));
130
+ border-radius: 999px;
131
+ }
132
+
133
+ .lede {
134
+ font-size: 1.1rem;
135
+ color: var(--muted);
136
+ max-width: 620px;
137
+ }
138
+
139
+ .hero-actions {
140
+ display: flex;
141
+ gap: 1rem;
142
+ align-items: center;
143
+ margin: 1.6rem 0 1.2rem;
144
+ flex-wrap: wrap;
145
+ }
146
+
147
+ .btn {
148
+ display: inline-flex;
149
+ align-items: center;
150
+ justify-content: center;
151
+ gap: 0.6rem;
152
+ padding: 0.85rem 1.4rem;
153
+ border-radius: 12px;
154
+ font-weight: 700;
155
+ border: 1px solid transparent;
156
+ cursor: pointer;
157
+ transition: transform 0.2s ease, box-shadow 0.2s ease, background 0.2s ease, border-color 0.2s ease;
158
+ }
159
+
160
+ .btn.primary {
161
+ background: linear-gradient(135deg, #7af5c4, #7c8eff);
162
+ color: #0a0f1f;
163
+ box-shadow: 0 15px 30px rgba(122, 245, 196, 0.25);
164
+ }
165
+
166
+ .btn.primary:hover {
167
+ transform: translateY(-2px);
168
+ box-shadow: 0 25px 45px rgba(122, 245, 196, 0.35);
169
+ }
170
+
171
+ .btn.ghost {
172
+ background: rgba(255, 255, 255, 0.05);
173
+ border-color: var(--border);
174
+ color: var(--text);
175
+ }
176
+
177
+ .btn.ghost:hover {
178
+ border-color: rgba(255, 255, 255, 0.3);
179
+ transform: translateY(-2px);
180
+ }
181
+
182
+ .btn.wide {
183
+ width: 100%;
184
+ justify-content: center;
185
+ }
186
+
187
+ .hero-badges {
188
+ display: flex;
189
+ flex-wrap: wrap;
190
+ gap: 0.6rem;
191
+ color: var(--muted);
192
+ font-size: 0.9rem;
193
+ }
194
+
195
+ .hero-badges span {
196
+ padding: 0.5rem 0.8rem;
197
+ border-radius: 10px;
198
+ border: 1px solid var(--border);
199
+ background: rgba(255, 255, 255, 0.04);
200
+ }
201
+
202
+ .hero-visual .glass-card {
203
+ background: rgba(255, 255, 255, 0.03);
204
+ border: 1px solid var(--border);
205
+ border-radius: 18px;
206
+ padding: 1.2rem;
207
+ box-shadow: var(--shadow);
208
+ backdrop-filter: blur(10px);
209
+ }
210
+
211
+ .architecture-preview {
212
+ background: rgba(0, 0, 0, 0.3);
213
+ border-radius: 14px;
214
+ border: 1px solid var(--border);
215
+ padding: 1.5rem;
216
+ overflow-x: auto;
217
+ }
218
+
219
+ .architecture-preview pre {
220
+ font-family: "SF Mono", "Fira Code", "Consolas", monospace;
221
+ font-size: 0.85rem;
222
+ color: var(--accent);
223
+ white-space: pre;
224
+ margin: 0;
225
+ line-height: 1.5;
226
+ }
227
+
228
+ .caption {
229
+ margin-top: 0.75rem;
230
+ color: var(--muted);
231
+ font-size: 0.95rem;
232
+ }
233
+
234
+ .section {
235
+ max-width: 1200px;
236
+ margin: 0 auto;
237
+ padding: clamp(2rem, 4vw, 3.5rem) clamp(1.5rem, 3vw, 3rem);
238
+ }
239
+
240
+ .section-header {
241
+ text-align: center;
242
+ max-width: 780px;
243
+ margin: 0 auto 2rem;
244
+ }
245
+
246
+ .section-header h2 {
247
+ font-size: clamp(2rem, 3vw, 2.6rem);
248
+ margin-bottom: 0.5rem;
249
+ }
250
+
251
+ .intro {
252
+ color: var(--muted);
253
+ font-size: 1.05rem;
254
+ }
255
+
256
+ .feature-grid {
257
+ display: grid;
258
+ grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
259
+ gap: 1rem;
260
+ }
261
+
262
+ .feature-card {
263
+ background: rgba(255, 255, 255, 0.03);
264
+ border: 1px solid var(--border);
265
+ border-radius: 16px;
266
+ padding: 1.25rem;
267
+ box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
268
+ transition: transform 0.2s ease, border-color 0.2s ease, box-shadow 0.2s ease;
269
+ }
270
+
271
+ .feature-card:hover {
272
+ transform: translateY(-4px);
273
+ border-color: rgba(122, 245, 196, 0.3);
274
+ box-shadow: 0 18px 40px rgba(0, 0, 0, 0.3);
275
+ }
276
+
277
+ .feature-card .icon {
278
+ width: 48px;
279
+ height: 48px;
280
+ border-radius: 12px;
281
+ display: grid;
282
+ place-items: center;
283
+ background: rgba(122, 245, 196, 0.14);
284
+ margin-bottom: 0.8rem;
285
+ font-size: 1.4rem;
286
+ }
287
+
288
+ .feature-card h3 {
289
+ margin-bottom: 0.35rem;
290
+ }
291
+
292
+ .feature-card p {
293
+ color: var(--muted);
294
+ }
295
+
296
+ .story {
297
+ padding-top: 1rem;
298
+ }
299
+
300
+ .story-grid {
301
+ display: grid;
302
+ grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
303
+ gap: 1rem;
304
+ }
305
+
306
+ .story-card {
307
+ background: rgba(255, 255, 255, 0.03);
308
+ border: 1px solid var(--border);
309
+ border-radius: 18px;
310
+ padding: 1.5rem;
311
+ box-shadow: var(--shadow);
312
+ }
313
+
314
+ .story-card.secondary {
315
+ background: linear-gradient(145deg, rgba(124, 142, 255, 0.08), rgba(122, 245, 196, 0.06));
316
+ }
317
+
318
+ .story-card h3 {
319
+ margin-bottom: 0.8rem;
320
+ }
321
+
322
+ .story-list {
323
+ list-style: none;
324
+ display: grid;
325
+ gap: 0.7rem;
326
+ color: var(--muted);
327
+ font-size: 0.98rem;
328
+ }
329
+
330
+ .story-list li {
331
+ display: flex;
332
+ gap: 0.7rem;
333
+ align-items: flex-start;
334
+ }
335
+
336
+ .story-text {
337
+ color: var(--muted);
338
+ line-height: 1.7;
339
+ margin-bottom: 1rem;
340
+ }
341
+
342
+ .chips {
343
+ display: flex;
344
+ flex-wrap: wrap;
345
+ gap: 0.5rem;
346
+ }
347
+
348
+ .chip {
349
+ padding: 0.45rem 0.8rem;
350
+ border-radius: 12px;
351
+ background: rgba(0, 0, 0, 0.2);
352
+ border: 1px solid var(--border);
353
+ color: var(--text);
354
+ font-size: 0.9rem;
355
+ }
356
+
357
+ .footer {
358
+ text-align: center;
359
+ color: var(--muted);
360
+ padding: 2rem 1.5rem 0;
361
+ }
362
+
363
+ .footer a {
364
+ color: var(--text);
365
+ border-bottom: 1px solid transparent;
366
+ }
367
+
368
+ .footer a:hover {
369
+ border-color: rgba(255, 255, 255, 0.5);
370
+ }
371
+
372
+ @media (max-width: 768px) {
373
+ .hero {
374
+ padding-top: 2.5rem;
375
+ }
376
+
377
+ .topline {
378
+ flex-direction: column;
379
+ gap: 0.8rem;
380
+ align-items: flex-start;
381
+ }
382
+
383
+ .hero-actions {
384
+ width: 100%;
385
+ }
386
+
387
+ .btn {
388
+ width: 100%;
389
+ justify-content: center;
390
+ }
391
+
392
+ .hero-badges {
393
+ gap: 0.4rem;
394
+ }
395
+ }
uv.lock ADDED
The diff for this file is too large to render. See raw diff