Login
Sign Up
Woofun AI reports that Cartesia has deployed Sonic-3.5 and Ink-2, establishing a unified real-time speech AI technology stack. Sonic-3.5 handles Text-to-Speech (TTS) with a focus on low-latency generation, achieving an initial audio output time of 90 milliseconds. It natively supports 42 languages and processes English homographs and alphanumeric characters without preprocessing.
Concurrently, Ink-2 manages Speech-to-Text (STT) with a Word Error Rate of 3.6%. The model introduces native turn detection and noise handling, determining user intent based on semantic context rather than silence duration. Currently limited to English, multi-language support is slated for future updates. Developers can invoke both models via a single API, enabling bidirectional interaction to minimize transmission latency and overhead associated with 'multi-vendor stitching.'