Ship voice UX in your web app in minutes.
Streaming speech-to-text, voice commands, and best-in-class TTS. TypeScript-first SDK, <500ms streaming latency, no realtime infra to run.
- Dictate: Production-ready dictation with clean output.
- AI Edit: Voice edits that rewrite in place.
- Commands: Intent to action across your UI + APIs.
- TTS: In-app voice responses.
OUTPUT
Fast enough to feel native. Reliable enough to ship.
TECHNICAL SPECIFICATIONS
INPUT SPEED
~3×
vs. mobile typing¹
ERROR RATE
20.4%
fewer errors¹
DICTATION
~160
WPM¹
LATENCY
<500
ms (streaming)
¹ Based on published HCI research; your results vary by device, environment, and model configuration.
BUILT ON BATTLE-TESTED INFRA
Voice isn't "just STT." It's product UX.
Raw transcripts aren't enough. You need streaming, formatting, edit loops, commands, and voice output—without running real-time infra.
The old way
- 01
Use browser/OS voice APIs
Inconsistent output, limited control, and brittle UX.
- 02
Stitch providers together
STT → LLM → commands → TTS plus auth, streaming, retries, and UI glue.
- 03
Build voice in-house
Months of edge cases, state, analytics, and maintenance.
- 04
Operate real-time infra
Audio pipelines, websockets, scaling, and observability.
With SpeechOS
- 01
Drop in one SDK
No infra to stitch or run — SpeechOS orchestrates the stack.
- 02
Dictate + Edit feels like one flow
Users speak naturally, then refine instantly.
- 03
Commands trigger real actions
Map intent to your UI + APIs.
- 04
TTS closes the loop
Voice responses, prompts, and confirmations.
VOICE PRIMITIVES
Four voice primitives. One integration.
Everything you need to ship voice UX that users actually adopt.
Dictate
Clean text by default.
Polished text automatically, ready to use without cleanup.
- Punctuation + capitalization baked in
- Filler removed for clean, written-style output
- Works across notes, forms, editors, and comments
AI Edit
Editing at the speed of speech.
Transform text instantly with simple voice edits.
- "Make it shorter / more formal / more direct"
- "Rewrite for clarity", "translate", "summarize"
- Polish user-generated content in place
Commands
Voice → real product actions.
Turn spoken intent into real UI + API actions.
- Commands like "submit", "log activity", "create task", "next field"
- Intent matching across natural phrasing
- Track adoption and refine what people actually say
TTS
Text-to-speech for in-app voice responses.
Speak confirmations, summaries, and guidance so your product can talk back in real time.
- Voice confirmations after actions or submissions
- Spoken prompts, updates, and status changes
- Accessibility for users who prefer audio
HOW IT WORKS
Two ways to integrate
Use the automatic widget for instant voice UX, or call the SDK directly for full control.
Widget (automatic)
Initialize once. The widget appears automatically when users focus text inputs or select text. Handles dictation, editing, commands, and read-aloud with zero additional code.
- Auto-detects input, textarea, contenteditable
- Read-aloud on text selection
- Works with React, Vue, vanilla JS
SDK (programmatic)
Call the API directly for custom UIs or headless integrations. Full control over when and how voice actions are triggered.
dictate(),edit(),command(),tts.speak()- Build your own UI or go headless
- React hooks:
useDictation,useEdit,useTTS
Try the Playground
Dictate, edit, trigger commands, and hear responses—no signup required.
USE CASES
Use cases that ship well in real products
See how developers are using SpeechOS to add voice UX to their applications.
Text-heavy workflows
Voice-enable forms, notes, and composers without sacrificing precision.
Editors & content
Draft and revise long-form text by voice, then polish fast.
Power-user workflows
Hands-free actions for navigation, submission, assignment, and status changes.
In-app voice responses
Let users listen to updates, guidance, and confirmations.
FOR DEVELOPERS
Ship voice UX in minutes
Install once. SpeechOS appears automatically across your app's text surfaces.
Prefer to customize? Configure commands, vocabulary, and UX behavior in the dashboard.
Install the SDK
Add SpeechOS to your project with a single command. Available via npm for any web app or with React bindings.
# Any web appnpm install @speechos/client # React bindingsnpm install @speechos/reactYour app becomes voice-enabled
SpeechOS appears across text surfaces and handles dictation, edits, commands, and voice responses with one UI.
REACT QUICKSTART
Add voice input in two steps
Use the useDictation hook to capture speech and get formatted text.
npm install @speechos/reactimport { useDictation } from '@speechos/react'; function VoiceNote() { const { start, stop, transcript } = useDictation(); return ( <> <button onClick={start}>Start</button> <button onClick={stop}>Stop</button> <p>{transcript}</p> </> );}CODE EXAMPLES
Simple APIs for every action
Dictate, edit, run commands, and speak text—each in one line.
Dictate
Start recording, stop when done—get clean, formatted text back.
import { SpeechOS } from '@speechos/client'; // Start capturing speech.SpeechOS.dictate(); // ... user speaks ... // Stop and get the formatted transcript.const text = await SpeechOS.stopDictation();// Example: "The customer is blocked on SSO setup."Edit with voice
Say "make it shorter" or "translate to French"—get rewritten text.
import { SpeechOS } from '@speechos/client'; // Provide the text to transform.SpeechOS.edit('Your existing content here...'); // ... user says "make it more concise" ... // Get the rewritten result.const edited = await SpeechOS.stopEdit();Voice commands
Map spoken intent to actions—"submit", "create task", "go back".
import { events } from '@speechos/client'; events.on('command:matched', ({ commands }) => { // Match spoken intent to real product actions. commands.forEach((cmd) => { if (cmd.name === 'submit') submitForm(); if (cmd.name === 'create_task') createTask(cmd.arguments.title); });});Read Aloud (TTS)
Speak any text—great for in-app voice responses and accessibility.
import { tts } from '@speechos/client'; // Speak back to the user after an action.await tts.speak('Your report is ready.'); // Use it for prompts, guidance, and confirmations.Privacy-first by default
Built for real products where user data is sensitive and trust is non-negotiable.
Real-time streaming
Processed live; not stored by default.
You control the data
Your customer controls data; SpeechOS is a processor.
DPA available
For enterprise procurement and reviews.
Built for business use
Supports compliance-driven workflows and audits.
FAQ
Frequently Asked Questions
Common questions from developers integrating SpeechOS.
Yes. The SDK uses a public client key designed for browser use. Use domain allowlists and rate limits in the dashboard for control.
Inputs and textareas by default; Read Aloud works on selections. For custom editors like ProseMirror, Slate, or Quill, use manual attach APIs.
Yes. Disable formDetection and attach to specific elements programmatically with SpeechOS.showFor() and SpeechOS.attachTo().
Audio is processed live and not stored by default. We store transcripts and metadata for analytics. Contact us for custom retention policies.
No. Customer data is not used for training models.
Chrome, Edge, Firefox, Safari, and mobile browsers on iOS and Android. The SDK uses Web Audio API and WebRTC for real-time streaming.
<500ms streaming latency for dictation. Typical AI edits complete in 1–2 seconds.
Yes — @speechos/react with hooks like useDictation, useEdit, and useCommand for seamless React integration.
Ship voice UX without building a speech stack.
Dictation, AI edits, commands, and in-app voice responses in one SDK.
Prefer to build? View docs