Cohere released an open-source transcription model — 2 billion parameters and 14 languages
Cohere released an open-source speech model built specifically for transcription. With just 2 billion parameters, it is designed to run on a standard…
AI-processed from TechCrunch; edited by Hamidun News
Cohere released an open-source speech transcription model. Unlike most competitors, the model weighs only 2 billion parameters — this was deliberate, to allow it to run on a regular consumer GPU without resorting to expensive server clusters or cloud APIs. The company positions the new tool as an instrument for developers who want to deploy transcription on their own.
Cohere is a Canadian AI company founded in 2019 by alumni of Google Brain. So far, it has been known primarily as a supplier of enterprise language models: its flagship Command model competes with GPT-4 and Claude in the enterprise segment, and its Embed embedding system is used in thousands of production applications for semantic search. Voice tools are a new direction for the company, and immediately with a focus on specialization: instead of a universal multimodal solution, they released a tool honed for a single task.
The automatic speech recognition market is undergoing transformation. Historically, it was controlled by technology giants: Google Speech-to-Text, Amazon Transcribe, Microsoft Azure Speech. All of them operate on a cloud model — audio goes to the provider's servers, is processed there, and text is returned. This creates two problems: growing expenses with large volumes and privacy concerns, critical for certain industries. Large providers make money on scale, but for startups and mid-size companies, the cost of cloud transcription quickly becomes a significant expense item.
A turning point came in 2022, when OpenAI released Whisper — an open-source transcription model that can be run locally. Whisper changed the market: developers massively switched to self-hosted transcription, and fast variants appeared like faster-whisper based on CTranslate2 and lightweight distilled versions. However, Whisper has known limitations. Large versions require a GPU with 8–10 GB VRAM, and the model itself has not received significant updates since the Large v3 release in 2023. The market was waiting for a worthy alternative. This is where space opens up for Cohere's model.
2 billion parameters — this is not a compromise, but a deliberate bet on accessibility. For comparison: Whisper Large v3, considered the quality benchmark, has 1.5 billion parameters and requires a minimum of 8 GB VRAM in half precision. Cohere's model is slightly larger in parameter count, but, judging by the stated compatibility with consumer GPUs, is better optimized for running without a data center. Support for 14 languages covers most production scenarios for global companies.
Open-source status is also a matter of privacy. Companies in the financial, medical, legal, and government sectors cannot simply send sensitive conversations and recordings to the servers of third-party providers. Regulatory requirements of HIPAA, GDPR, Russia's 152-ФЗ, and similar laws require control over data processing. Self-hosted transcription removes this barrier entirely: audio is processed locally, nothing goes outside.
Until now, the only mature option for such scenarios remained Whisper with its production limitations. Publishing an open tool is also a strategic move by Cohere. A free model attracts developers into the company's ecosystem, forms future dependence on corporate cloud products when scaling the business, and builds a reputation as a partner to be trusted.
This is the same logic that Meta uses with Llama, and Mistral with its open models: first build trust through openness, then monetize through enterprise. Independent benchmarks will appear in the coming weeks. For now, it is unclear how the model performs under heavy noise, difficult accents, and specialized terminology.
If accuracy proves comparable to Whisper Large v3, this will significantly change the balance of power in the open-source transcription segment. Developers building meeting transcription systems, call centers, medical documentation tools, or voice notes, should add Cohere's model to their list of candidates for testing.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.