Ship voice UX in your web app in minutes.

Streaming speech-to-text, voice commands, and best-in-class TTS. TypeScript-first SDK, <500ms streaming latency, no realtime infra to run.

  • Dictate: Production-ready dictation with clean output.
  • AI Edit: Voice edits that rewrite in place.
  • Commands: Intent to action across your UI + APIs.
  • TTS: In-app voice responses.
DICTATE

um so I just uh finished the quarterly report and like the main takeaway is that revenue grew 15 percent

OUTPUT

3× faster than typing<500ms latency

Fast enough to feel native. Reliable enough to ship.

TECHNICAL SPECIFICATIONS

INPUT SPEED

~3×

vs. mobile typing¹

ERROR RATE

20.4%

fewer errors¹

DICTATION

~160

WPM¹

LATENCY

<500

ms (streaming)

¹ Based on published HCI research; your results vary by device, environment, and model configuration.

BUILT ON BATTLE-TESTED INFRA

Deepgram - Speech-to-Text
Groq - Fast LLM inference
LiveKit - Real-time audio transport

Voice isn't "just STT." It's product UX.

Raw transcripts aren't enough. You need streaming, formatting, edit loops, commands, and voice output—without running real-time infra.

The old way

  • 01

    Use browser/OS voice APIs

    Inconsistent output, limited control, and brittle UX.

  • 02

    Stitch providers together

    STT → LLM → commands → TTS plus auth, streaming, retries, and UI glue.

  • 03

    Build voice in-house

    Months of edge cases, state, analytics, and maintenance.

  • 04

    Operate real-time infra

    Audio pipelines, websockets, scaling, and observability.

With SpeechOS

  • 01

    Drop in one SDK

    No infra to stitch or run — SpeechOS orchestrates the stack.

  • 02

    Dictate + Edit feels like one flow

    Users speak naturally, then refine instantly.

  • 03

    Commands trigger real actions

    Map intent to your UI + APIs.

  • 04

    TTS closes the loop

    Voice responses, prompts, and confirmations.

VOICE PRIMITIVES

Four voice primitives. One integration.

Everything you need to ship voice UX that users actually adopt.

01~160 WPM

Dictate

Clean text by default.

Polished text automatically, ready to use without cleanup.

  • Punctuation + capitalization baked in
  • Filler removed for clean, written-style output
  • Works across notes, forms, editors, and comments
02800+ WPM

AI Edit

Editing at the speed of speech.

Transform text instantly with simple voice edits.

  • "Make it shorter / more formal / more direct"
  • "Rewrite for clarity", "translate", "summarize"
  • Polish user-generated content in place
03INSTANT

Commands

Voice → real product actions.

Turn spoken intent into real UI + API actions.

  • Commands like "submit", "log activity", "create task", "next field"
  • Intent matching across natural phrasing
  • Track adoption and refine what people actually say
04TTS

TTS

Text-to-speech for in-app voice responses.

Speak confirmations, summaries, and guidance so your product can talk back in real time.

  • Voice confirmations after actions or submissions
  • Spoken prompts, updates, and status changes
  • Accessibility for users who prefer audio

HOW IT WORKS

Two ways to integrate

Use the automatic widget for instant voice UX, or call the SDK directly for full control.

Widget (automatic)

Initialize once. The widget appears automatically when users focus text inputs or select text. Handles dictation, editing, commands, and read-aloud with zero additional code.

  • Auto-detects input, textarea, contenteditable
  • Read-aloud on text selection
  • Works with React, Vue, vanilla JS
Preview the widget

SDK (programmatic)

Call the API directly for custom UIs or headless integrations. Full control over when and how voice actions are triggered.

  • dictate(), edit(), command(), tts.speak()
  • Build your own UI or go headless
  • React hooks: useDictation, useEdit, useTTS
See code examples

Try the Playground

Dictate, edit, trigger commands, and hear responses—no signup required.

USE CASES

Use cases that ship well in real products

See how developers are using SpeechOS to add voice UX to their applications.

Text-heavy workflows

Voice-enable forms, notes, and composers without sacrificing precision.

Editors & content

Draft and revise long-form text by voice, then polish fast.

Power-user workflows

Hands-free actions for navigation, submission, assignment, and status changes.

In-app voice responses

Let users listen to updates, guidance, and confirmations.

FOR DEVELOPERS

Ship voice UX in minutes

Install once. SpeechOS appears automatically across your app's text surfaces.

Prefer to customize? Configure commands, vocabulary, and UX behavior in the dashboard.

01

Install the SDK

Add SpeechOS to your project with a single command. Available via npm for any web app or with React bindings.

bash
# Any web app
npm install @speechos/client
 
# React bindings
npm install @speechos/react
02

Your app becomes voice-enabled

SpeechOS appears across text surfaces and handles dictation, edits, commands, and voice responses with one UI.

Dictate
Edit
Command
Read

REACT QUICKSTART

Add voice input in two steps

Use the useDictation hook to capture speech and get formatted text.

01Install
npm install @speechos/react
02Use the hook
import { useDictation } from '@speechos/react';
 
function VoiceNote() {
const { start, stop, transcript } = useDictation();
 
return (
<>
<button onClick={start}>Start</button>
<button onClick={stop}>Stop</button>
<p>{transcript}</p>
</>
);
}
03Try it
Press Start, speak, then press Stop

CODE EXAMPLES

Simple APIs for every action

Dictate, edit, run commands, and speak text—each in one line.

Dictate

Start recording, stop when done—get clean, formatted text back.

typescript
import { SpeechOS } from '@speechos/client';
 
// Start capturing speech.
SpeechOS.dictate();
 
// ... user speaks ...
 
// Stop and get the formatted transcript.
const text = await SpeechOS.stopDictation();
// Example: "The customer is blocked on SSO setup."

Edit with voice

Say "make it shorter" or "translate to French"—get rewritten text.

typescript
import { SpeechOS } from '@speechos/client';
 
// Provide the text to transform.
SpeechOS.edit('Your existing content here...');
 
// ... user says "make it more concise" ...
 
// Get the rewritten result.
const edited = await SpeechOS.stopEdit();

Voice commands

Map spoken intent to actions—"submit", "create task", "go back".

typescript
import { events } from '@speechos/client';
 
events.on('command:matched', ({ commands }) => {
// Match spoken intent to real product actions.
commands.forEach((cmd) => {
if (cmd.name === 'submit') submitForm();
if (cmd.name === 'create_task') createTask(cmd.arguments.title);
});
});

Read Aloud (TTS)

Speak any text—great for in-app voice responses and accessibility.

typescript
import { tts } from '@speechos/client';
 
// Speak back to the user after an action.
await tts.speak('Your report is ready.');
 
// Use it for prompts, guidance, and confirmations.

Privacy-first by default

Built for real products where user data is sensitive and trust is non-negotiable.

Real-time streaming

Processed live; not stored by default.

You control the data

Your customer controls data; SpeechOS is a processor.

DPA available

For enterprise procurement and reviews.

Built for business use

Supports compliance-driven workflows and audits.

FAQ

Frequently Asked Questions

Common questions from developers integrating SpeechOS.

Yes. The SDK uses a public client key designed for browser use. Use domain allowlists and rate limits in the dashboard for control.

Inputs and textareas by default; Read Aloud works on selections. For custom editors like ProseMirror, Slate, or Quill, use manual attach APIs.

Yes. Disable formDetection and attach to specific elements programmatically with SpeechOS.showFor() and SpeechOS.attachTo().

Audio is processed live and not stored by default. We store transcripts and metadata for analytics. Contact us for custom retention policies.

No. Customer data is not used for training models.

Chrome, Edge, Firefox, Safari, and mobile browsers on iOS and Android. The SDK uses Web Audio API and WebRTC for real-time streaming.

<500ms streaming latency for dictation. Typical AI edits complete in 1–2 seconds.

Yes — @speechos/react with hooks like useDictation, useEdit, and useCommand for seamless React integration.

Ship voice UX without building a speech stack.

Dictation, AI edits, commands, and in-app voice responses in one SDK.

Prefer to build? View docs