Skip to Content Facebook Feature Image

Deepgram Unveils Aura-2: The World’s Most Professional, Cost-Effective, and Enterprise-Grade Text-to-Speech Model

News

Deepgram Unveils Aura-2: The World’s Most Professional, Cost-Effective, and Enterprise-Grade Text-to-Speech Model
News

News

Deepgram Unveils Aura-2: The World’s Most Professional, Cost-Effective, and Enterprise-Grade Text-to-Speech Model

2025-04-15 20:31 Last Updated At:20:51

SAN FRANCISCO--(BUSINESS WIRE)--Apr 15, 2025--

Deepgram, the leading voice AI platform for enterprise use cases, today announced Aura-2, its next-generation text-to-speech (TTS) model purpose-built for real-time voice applications in mission-critical business environments. Engineered for clarity, consistency, and low-latency performance, and deployable via cloud or on-premises APIs, Aura-2 enables developers to build scalable, human-like voice experiences for automated interactions across the enterprise, including customer support, virtual agents, and AI-powered assistants. Aura-2 is built on Deepgram Enterprise Runtime—the same infrastructure that powers the company’s industry-leading speech-to-text (STT) and speech-to-speech (STS) capabilities—providing enterprises with the control, adaptability, and performance required to deploy and scale production-grade voice AI. With Aura-2, Deepgram extends its leadership in enterprise speech technology to TTS, enabling businesses to deliver natural, responsive, and contextually accurate conversations at scale. Today, more than 200,000 developers and 1,200 companies, including Fortune 500 enterprises and voice AI startups like Jack in the Box, Vapi, and OneReach.ai, build on Deepgram.

More Images

Figure 3: Word Error Rate (WER) – Streaming STT

Figure 3: Word Error Rate (WER) – Streaming STT

Figure 2: TTS Pricing Comparison – Aura-2 Advantage

Figure 2: TTS Pricing Comparison – Aura-2 Advantage

Figure 1: User Preference for Enterprise Use Cases (Blinded Human Evals)

Figure 1: User Preference for Enterprise Use Cases (Blinded Human Evals)

This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20250415446781/en/

"We’ve relied on Deepgram’s speech recognition to power real-time voice interactions at scale, so the opportunity to deploy TTS within the same enterprise-grade infrastructure is incredibly compelling," said Nikhil Gupta, CTO of Vapi. "Having both STT and TTS from a single provider significantly reduces integration complexity and latency, enabling smoother experiences for teams building conversational AI at scale."

"Aura-2’s remarkable clarity and naturalness significantly enhance our conversational AI solutions, making customer interactions smoother and more engaging,” said Thys Waanders, SVP of AI Transformation, Cognigy. “Deepgram’s ability to deliver real-time, domain-specific pronunciation at scale ensures we meet the complex needs of enterprise contact centers while maintaining efficiency and reducing costs."

Closing the Gap: Enterprise-Optimized Voice AI

In today’s TTS landscape, a significant gap exists between entertainment-focused models and the operational demands of enterprise-grade voice systems. While entertainment-focused TTS platforms are trained on and optimized for storytelling, character voices, and emotionally expressive delivery, they fall short when applied to enterprise use cases. Enterprise applications require more than natural-sounding voices—they demand domain-specific pronunciation, a professional tone, consistent contextual handling, and the ability to perform reliably, cost-effectively, and securely—often in environments that require full deployment control.

Aura-2 bridges this divide, delivering high-quality, context-aware speech designed for the scale, precision, and resilience that business-critical environments demand. Unlike entertainment-focused systems optimized for creative expression, Aura-2 reflects the priorities of enterprise voice AI, delivering benefits across key dimensions:

Domain-Specific Pronunciation Excellence – Aura-2 ensures precise handling of industry terminology, accurately pronouncing healthcare terms, financial jargon, product names, and complex numerals without special tagging. This built-in accuracy eliminates the need for extensive pronunciation dictionaries or manual intervention, ensuring clear communication in specialized fields where precision matters most.

Professional Voice Quality & Naturalness – With 40+ distinct voices spanning U.S. English and localized accents, Aura-2 delivers authentic, business-appropriate speech that avoids the overly theatrical tones common in entertainment-focused TTS. Organizations can select consistent voice personas—from "empathetic and charismatic" to "calm and professional"—that align with their brand identity across all customer touchpoints. Support for additional languages is already in development to further expand global reach.

Context-Aware Delivery – Aura-2 intelligently adjusts pacing, pauses, tone, and expression based on context—whether delivering a phone number, handling a support escalation, or navigating a transactional interaction. The result is smooth, coherent speech with uniform volume and crisp articulation throughout.

These voice and delivery advantages translate into real user preference. In head-to-head comparisons across enterprise scenarios, Deepgram came out on top nearly 60% of the time.

Real-Time Performance at Scale – Aura-2 is optimized for real-world enterprise workloads, delivering sub-200ms time-to-first-byte (TTFB) for ultra-responsive interactions. It efficiently supports thousands of concurrent requests while maintaining consistently low latency and high-quality speech output across high-volume deployments—from call centers to virtual assistants. For teams with strict security or data residency requirements, deploying Aura-2 on-premises or in a VPC not only ensures full control—it can also reduce latency by eliminating round trips to the cloud.

Cost-Effectiveness at Scale – Aura-2 delivers enterprise-grade speech with transparent pricing optimized for volume. At $0.030 per 1,000 characters, it offers substantial savings compared to alternatives like ElevenLabs Turbo ($0.050) and Cartesia Sonic ($0.038). Deepgram's usage-based model includes all 40+ voices at a single rate with no hidden fees and offers tiered enterprise pricing to significantly reduce costs for high-volume implementations. This approach eliminates quality/cost tradeoffs, enabling consistent voice experiences across all touchpoints without sacrificing performance to control costs.

​​“Our customers need more than just voices that sound good—they need voices that communicate precisely and reliably in professional contexts,” said Scott Stephenson, CEO of Deepgram. “Aura-2 delivers the perfect balance of natural speech and enterprise-grade accuracy, enabling organizations to create voice experiences that truly enhance customer engagement while maintaining operational efficiency.”

"Aura-2 sets a new bar for enterprise-grade TTS. The clarity, consistency, and low latency it delivers have been game changers for our AI agent experiences," said Bernardo Aceituno, Co-Founder at Stack AI. "With Deepgram's voice synthesis, we're able to build workflows that not only sound more human but also perform with the reliability enterprises demand."

"We chose Deepgram because it delivers both STT and TTS with the speed, cost-efficiency, and accuracy we need to support real-time interactions at scale," said Caesar Gui, CEO, LockedIn AI. "Aura-2’s responsiveness and quality let us create AI agents that feel natural in conversation—and having one provider across the voice stack means faster iteration and fewer integration headaches.”

Enterprise-Grade Architecture for Real-Time Applications

Aura-2 is powered by Deepgram Enterprise Runtime (DER)—a custom-built infrastructure layer that runs all of Deepgram’s speech models. Designed specifically for enterprise-grade performance, DER orchestrates voice AI in real time with the speed, reliability, and adaptability required for production-scale deployments. Key capabilities include:

By running on DER, Aura-2 inherits an enterprise-grade foundation built for mission-critical performance. This architectural advantage means organizations can deploy advanced TTS capabilities while maintaining the same operational standards for security, reliability, and scalability that define Deepgram's trusted platform. Unlike providers limited to cloud-only deployments, Deepgram offers true deployment flexibility—with symmetric performance across cloud, VPC, and on-premises environments—so enterprises can meet security and infrastructure requirements without tradeoffs. Rather than managing separate systems with different operational characteristics, enterprises gain a cohesive voice AI infrastructure designed for production environments.

Deepgram's STT Leadership Strengthens TTS Capabilities

Deepgram's proven leadership in STT gives Aura-2 a distinct advantage in delivering accurate, production-ready TTS. By running on the same enterprise runtime that powers Nova-3 for speech recognition and the Voice Agent API for conversational AI, Aura-2 benefits from shared learning, unified deployment, and a seamless developer experience. This deep integration across Deepgram's voice AI stack eliminates the operational complexity and debugging challenges that typically arise from stitching together tools from multiple vendors.

"Our years developing Nova-3 and other STT models gave us deep insight into real-world speech patterns," said Natalie Rutgers, VP of Product at Deepgram. "With the Enterprise Runtime, Aura-2 directly leverages our acoustic models and pronunciation datasets to deliver precise, industry-specific speech synthesis in real time."

This unified architecture enables continuous cross-model learning, where improvements in speech recognition automatically enhance speech synthesis through the shared runtime. As the platform learns and adapts to your specific industry terminology and user interactions, it transforms isolated voice components into a cohesive voice AI platform that strengthens with every interaction. The result for enterprises is measurably better performance: consistent pronunciation across systems, reduced end-to-end latency, and real-time model customization—all with the same platform reliability that has made Deepgram the gold standard in voice AI infrastructure.

See Aura-2 in Action

Start building with enterprise-grade TTS today. Experience Aura-2 instantly through our interactive playground or explore in-depth product capabilities at deepgram.com. New users receive $200 in free credits—enough to generate over 13 million characters (~220 hours of speech). Take the first step toward transforming your voice applications with Deepgram's industry-leading technology.

Additional Resources:

About Deepgram

Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities–all powered by our enterprise-grade runtime. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, visit www.deepgram.com, read our developer docs, or follow @DeepgramAI on X and LinkedIn.

Figure 3: Word Error Rate (WER) – Streaming STT

Figure 3: Word Error Rate (WER) – Streaming STT

Figure 2: TTS Pricing Comparison – Aura-2 Advantage

Figure 2: TTS Pricing Comparison – Aura-2 Advantage

Figure 1: User Preference for Enterprise Use Cases (Blinded Human Evals)

Figure 1: User Preference for Enterprise Use Cases (Blinded Human Evals)

KYIV, Ukraine (AP) — Russian drones blasted apartment buildings and the power grid in the southern Ukraine city of Odesa in an overnight attack that injured six people, including a toddler and two other children, officials said Wednesday.

Four apartment buildings were damaged in the bombardment, according to regional military administration head Oleh Kiper. Power company DTEK said two of its energy facilities suffered significant damage. The company said that 10 substations that distribute electricity in the Odesa region were damaged in December alone.

Russia has this year escalated its long-range attacks on urban areas of Ukraine. In recent months, as Russia’s invasion of its neighbor approaches its four-year milestone in February, it has also intensified its targeting of energy infrastructure, seeking to deny Ukrainians heat and running water in the bitter winter months.

From January to November this year, more than 2,300 Ukrainian civilians were killed and more than 11,000 were injured, the United Nations said earlier this month. That was 26% higher than in the same period in 2024 and 70% higher than in 2023, it said.

Russia’s sustained drone and missile attacks have taken place against backdrop of renewed diplomatic efforts to stop the fighting.

U.S. President Donald Trump hosted Ukrainian President Volodymyr Zelenskyy at his Florida resort on Sunday and announced that a settlement is “closer than ever before." The Ukrainian leader is due to hold talks next week with the heads of European governments supporting his efforts to secure acceptable terms.

The ongoing attacks, meantime, are inflaming tensions.

The overnight Odesa strikes “are further evidence of the enemy’s terror tactics, which deliberately target civilian infrastructure,” Kiper, the regional head, said.

Moscow has alleged that Ukraine attempted to attack Russian President Vladimir Putin’s residence in northwestern Russia with 91 long-range drones late Sunday and early Monday. Ukrainian officials deny the claim and say it’s a ruse to derail progress in the peace negotiations.

Maj. Gen. Alexander Romanenkov of the Russian air force claimed Wednesday that the drones took off from Ukraine’s Sumy and Chernihiv regions.

At a briefing where no questions were allowed, he presented a map showing the drone flight routes before they were downed by Russian air defenses over the Bryansk, Tver, Smolensk and Novgorod regions.

It was not possible to independently verify the reports.

The European Union’s foreign policy chief, Kaja Kallas, on Wednesday called the Russian allegations “a deliberate distraction” from the peace talks.

“No one should accept unfounded claims from the aggressor who has indiscriminately targeted Ukraine’s infrastructure and civilians since the start of the war,” Kallas posted on X.

Zelenskyy said Wednesday that Romania and Croatia are the latest countries to join a fund that buys weapons for Ukraine from the United States. The financial arrangement, known as the Prioritized Ukraine Requirements List, or PURL, pools contributions from NATO members, except the United States, to purchase American weapons, munitions and equipment.

Since it was established in August, 24 countries are now contributing to the fund, according to Zelenskyy. The fund has so far received $4.3 billion, with almost $1.5 billion coming in December alone, he said on social media.

Ukraine’s air force said Wednesday that Russia fired 127 drones at the country during the night, with 101 of them intercepted by air defenses.

Meanwhile, the Russian Defense Ministry said that 86 Ukrainian drones were shot down overnight over Russian regions, the Black Sea and the illegally annexed Crimea peninsula.

The Ukrainian attack started a fire at an oil refinery in Russia's southern Krasnodar region, but it was quickly put out, local authorities said.

This story has corrected the day of the alleged Ukrainian drone attack on the Russian president’s residence to late Sunday and early Monday.

Follow AP’s coverage of the war in Ukraine at https://apnews.com/hub/russia-ukraine

In this photo provided by the Ukrainian Emergency Service, emergency services personnel work to extinguish a fire following a Russian attack in Odesa, Ukraine, Wednesday, Dec. 31, 2025. (Ukrainian Emergency Service via AP)

In this photo provided by the Ukrainian Emergency Service, emergency services personnel work to extinguish a fire following a Russian attack in Odesa, Ukraine, Wednesday, Dec. 31, 2025. (Ukrainian Emergency Service via AP)

In this image made from video provided by the Russian Defense Ministry Press Service on Tuesday, Dec. 30, 2025, a Russian Army soldier fires from D-30 howitzer towards Ukrainian positions in an undisclosed location in Ukraine. (Russian Defense Ministry Press Service via AP)

In this image made from video provided by the Russian Defense Ministry Press Service on Tuesday, Dec. 30, 2025, a Russian Army soldier fires from D-30 howitzer towards Ukrainian positions in an undisclosed location in Ukraine. (Russian Defense Ministry Press Service via AP)

Recommended Articles