Cartesia Deploys Sonic-3.5 and Ink-2 Models for Unified Real-Time Speech AI Stack
2026-06-16 19:02

Woofun AI reports that Cartesia has deployed Sonic-3.5 and Ink-2, establishing a unified real-time speech AI technology stack. Sonic-3.5 handles Text-to-Speech (TTS) with a focus on low-latency generation, achieving an initial audio output time of 90 milliseconds. It natively supports 42 languages and processes English homographs and alphanumeric characters without preprocessing.

Concurrently, Ink-2 manages Speech-to-Text (STT) with a Word Error Rate of 3.6%. The model introduces native turn detection and noise handling, determining user intent based on semantic context rather than silence duration. Currently limited to English, multi-language support is slated for future updates. Developers can invoke both models via a single API, enabling bidirectional interaction to minimize transmission latency and overhead associated with 'multi-vendor stitching.'

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Cartesia
Sonic-3.5
Ink-2
Share:
back