Image generation pioneer sets sights on a revolution in text AI
Stefano Ermon, one of the creators of the diffusion model technology behind image generators such as Stable Diffusion and DALL-E, unveiled a new development fro
AI-processed from Bloomberg Tech; edited by Hamidun News
When a scientist whose ideas have shaped an entire industry of generative media decides to pivot to text AI, the market should pay attention. Stefano Ermon, a professor at Stanford University and one of the leading researchers in diffusion models, has unveiled a technology through his startup Inception that promises to significantly accelerate text-based AI systems—from chatbots to corporate assistants.
To understand the scope of this move, we need to recall who Ermon is and why his name carries such weight. His research into score-based generative models became one of the foundations upon which Stable Diffusion, DALL-E, and dozens of other image and video creation services were built. Diffusion models—the very technology that enables converting text descriptions into photorealistic images—owe much of their existence to Ermon's work and that of his colleagues. This is not merely an academic contribution: we're talking about technology that generates billions of dollars in revenue for companies worldwide.
Now Ermon is targeting territory firmly held by OpenAI, Google, Anthropic, and Meta—natural language processing. His startup Inception, about which little was known until recently, has introduced technology capable of accelerating text generation in language models. While details remain incomplete, according to Bloomberg, the approach represents a fundamentally new take on inference architecture—the part of the model responsible for delivering answers to users in real time.
Inference speed is one of the central challenges in the large language model industry. Every time you ask a question of ChatGPT or Claude, the model generates an answer token by token, a process requiring enormous computational resources. Companies spend billions of dollars on GPU clusters to ensure acceptable response times for hundreds of millions of users. Any technology capable of reducing generation time by even tens of percent has colossal economic value. This is why dozens of startups and research labs are now focused on inference optimization—from Groq with their specialized chips to various software solutions for quantization and speculative decoding.
That Ermon brings experience from the world of diffusion models to this race may prove to be an unexpected advantage. Diffusion models work fundamentally differently than autoregressive transformers: instead of sequential token generation, they iteratively refine results from noise. Researchers have been experimenting for years with transferring diffusion principles to text generation, and some results look promising. If Inception has found a way to apply these ideas for practical acceleration of text models, it could represent a true breakthrough—not an evolutionary improvement, but a paradigm shift.
That said, skepticism is warranted. The AI startup market is flooded with ambitious claims, and far from all of them withstand reality at scale. It's one thing to demonstrate impressive results in a lab setting; quite another to deploy a technology for millions of users while maintaining answer quality. Major players like OpenAI and Google possess not only the most powerful infrastructure but also massive teams of engineers who have refined their systems over years. Competing with them on their own turf is a task of a completely different order than publishing a research paper.
Nevertheless, Ermon's reputation and track record make Inception one of the most interesting startups in the current landscape. The market for AI inference infrastructure is valued at tens of billions of dollars and growing rapidly. If Inception's technology actually works, the company has several strategic paths: licensing to major providers, creating its own API service, or, equally likely, acquisition by one of the tech giants.
The story of Inception also reflects a broader trend: the boundaries between different areas of generative AI are blurring. Ideas born in the world of images migrate to text, and vice versa. Multimodality ceases to be merely a marketing term and becomes engineering reality. If a scientist who transformed image generation can similarly radically influence text processing, it will be the best evidence that the AI industry is still far from maturity—and the most interesting breakthroughs may lie ahead.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.