Talk API — Voice Input with AI Cleaning
The Talk API combines real-time speech-to-text with LLM-powered transcript cleaning. Send raw audio via WebSocket, get polished, publication-ready text back. Perfect for push-to-talk interfaces, voice notes, and dictation features.
Python
from sayd_ai import Sayd
client = Sayd(api_key="sk-your-key")
# Create a Talk session with config
session = client.talk.create(
language="auto", # "en", "zh", or "multi"
sample_rate=16000, # 8000 or 16000 Hz
codec="pcm16", # "pcm16" or "opus"
cleaning_level="standard", # "light", "standard", "aggressive"
output_format="paragraph", # "paragraph", "bullets", "raw"
)
# Real-time streaming from microphone
for event in session.stream_microphone():
if event.type == "partial":
print(f"\r {event.text}", end="", flush=True)
elif event.type == "sentence":
print(f"\n[final] {event.text}")
elif event.type == "cleaned":
print(f"\n--- AI Cleaned ---")
print(event.cleaned_text)
# Also supports streaming from file:
# for event in session.stream_file("recording.wav"):
# ...
print(f"Duration: {session.duration_minutes:.1f} min")
print(f"Cost: ${session.cost_usd:.4f}")WebSocket Protocol
After creating a session via POST /v1/talk, connect to the returned WebSocket URL to stream audio and receive results.
Bash
# WebSocket Protocol — Talk API
## Connect
ws = WebSocket("wss://api.sayd.dev/v1/talk/stream/{session_id}?api_key=sk-...")
## Server Messages (you receive)
{"type": "ready"} # Session ready, start sending audio
{"type": "partial", "text": "..."} # Interim transcript (may change)
{"type": "sentence", "segments": []} # Final transcript segment
{"type": "cleaned", "text": "..."} # ✨ AI-cleaned result
{"type": "complete", ...} # Session complete
## Client Messages (you send)
[binary PCM16 frames] # Raw audio data
{"type": "end"} # Signal end of recording
{"type": "keepalive"} # Keep connection alive
## End Signal Behavior
# After receiving "end", the server continues accepting
# in-flight audio for up to 500ms (drain window), ensuring
# no trailing speech is lost. Then it waits for STT to
# stabilize and runs LLM cleaning. No client-side delay needed.API Endpoints
POST
/v1/talkCreate a Talk session (returns WebSocket URL)WS
/v1/talk/stream/{id}Stream audio with AI cleaningGET
/v1/talkList Talk sessionsGET
/v1/talk/{id}Get Talk session details & results