Do New AI Algorithms Pose A Bear Case for ASICs?
AI algorithms keep improving rapidly. What does it mean for companies building ASICs?
Welcome to Infinite Curiosity, a newsletter that explores the intersection of Artificial Intelligence and Startups. Tech enthusiasts across 200 countries have been reading what I write. Subscribe to this newsletter for free to directly receive it in your inbox:
The rapid evolution of AI is reshaping not only how we interact with technology but also the hardware that powers it. At the heart of this transformation lies algorithm development. It’s the process of designing computational methods that enable machines to perform complex tasks like language processing and image generation.
Gavin Baker recently posted the following on X in response to the release of diffusion based LLMs. And it made think about this topic:
Application-Specific Integrated Circuits (ASICs) have long been the backbone of AI’s hardware ecosystem. They offer unmatched efficiency for specific workloads. But emerging algorithms such as diffusion-based LLMs present a bear case for their future relevance.
This essay attempts to answer this question — Why exactly is it a bear case for LLMs? I explore how algorithm development in AI challenges the rigidity of ASICs, why diffusion LLMs exacerbate this issue, and what potential solutions do we have to ensure that the hardware keeps pace with innovation.
Understanding Algorithm Development in AI
Algorithm development in AI involves crafting and refining the computational frameworks that allow machines to mimic human-like capabilities. Historically, LLMs have relied on autoregressive approaches where text tokens are predicted sequentially from left to right. This method is optimized for efficiency and scalability. And it has been the foundation of models like GPT, driving their deployment on existing hardware.
But a new wave of innovation is disrupting this paradigm. Diffusion-based LLMs adopt a fundamentally different strategy. I have written about it here. Inspired by their success in image and video generation, diffusion models begin with random noise and iteratively refine it into coherent outputs in a non-autoregressive, parallel process.
This shift promises faster and more versatile text generation, but it departs significantly from the sequential logic of autoregressive models. This poses new demands on the hardware that supports AI.
What Are ASICs? And What’s their Role in AI?
ASICs are custom-built chips designed for specific tasks. This is different from the general-purpose flexibility of CPUs and GPUs. In AI, ASICs like Google’s Tensor Processing Units (TPUs) are tailored to accelerate workloads such as training and inferencing autoregressive LLMs.
By hardwiring optimizations for particular algorithms into their architecture, ASICs deliver superior performance and energy efficiency. This makes them indispensable for scaling AI in data centers and edge devices.
But this specialization comes at a cost: once fabricated, an ASIC’s design is fixed.
It’s optimized only for the algorithms and use cases envisioned during its development. For years, this rigidity aligned well with the dominance of autoregressive models. But the landscape is shifting.
Why Diffusion LLMs Pose a Problem for ASICs
The emergence of diffusion-based LLMs signals a potential crisis for ASICs. Autoregressive models process data sequentially in a manner ASICs are built to accelerate. Diffusion models operate through parallel, coarse-to-fine denoising steps. This architectural mismatch creates inefficiencies as existing ASICs may struggle to adapt to the parallel processing demands of diffusion frameworks.
The rapid pace of AI algorithm development amplifies this challenge. ASICs require years of design and fabrication along with significant financial investment. But their fixed nature leaves them vulnerable to obsolescence if algorithms evolve before they hit the market.
For example, Mercury’s reported performance of over 1,000 tokens per second on NVIDIA H100 GPUs highlights how adaptable platforms can outmaneuver ASICs in supporting cutting-edge algorithms. This flexibility threatens ASICs’ dominance, suggesting that their role in AI may diminish unless they evolve.
Solutions to Address the Challenge
To counter this bear case, the industry must rethink its approach to AI hardware. One promising solution is the adoption of Field-Programmable Gate Arrays (FPGAs). They offer reconfigurability to accommodate new algorithms like diffusion models without requiring entirely new chips.
Unlike ASICs, FPGAs can be reprogrammed post-fabrication.
This provides a bridge between fixed efficiency and adaptability. Another option is the development of hybrid architectures that blend ASICs’ performance with programmable elements. This enables partial updates to handle emerging workloads.
Beyond hardware, collaboration between AI algorithm developers and chip designers is critical. By aligning innovations in software and hardware early in the design process, the industry can reduce the risk of mismatched capabilities. Finally, advanced simulation and emulation tools can shorten ASIC development cycles. This allows faster iterations to keep pace with algorithmic shifts.
FPGAs vs ASICs
FPGAs and ASICs are both critical hardware solutions in AI. But they differ fundamentally in design, flexibility, and use cases. Here’s a detailed comparison:
Design and Flexibility
ASICs: These are custom-built chips designed for a specific task, such as accelerating autoregressive LLMs. Once fabricated, their architecture is fixed. They lock-in optimizations for a particular algorithm or workload. This rigidity maximizes efficiency but limits adaptability to new algorithms.
FPGAs: These are reconfigurable chips that can be programmed and reprogrammed after manufacturing. This flexibility allows FPGAs to adapt to evolving workloads without requiring new hardware. But they sacrifice some efficiency compared to ASICs.
Performance and Efficiency
ASICs: By tailoring their circuitry to a single purpose, ASICs offer superior performance and energy efficiency for their designated tasks. For example, Google’s TPUs excel at matrix operations for autoregressive models. They outperform general-purpose hardware in speed and power consumption.
FPGAs: While FPGAs deliver solid performance, they are less efficient than ASICs for any single task due to their general-purpose circuitry. But their ability to handle diverse workloads makes them competitive in dynamic environments.
Development Time and Cost
ASICs: Designing and fabricating an ASIC is a lengthy and expensive process. It often takes years and costs millions due to the need for custom silicon. This investment pays off only if the target algorithm remains relevant long-term.
FPGAs: FPGAs have a lower upfront cost and faster deployment since they don’t require custom fabrication. Updates can be made via software, reducing the time and expense of adapting to new algorithms.
Use Case in AI
ASICs: Ideal for stable, high-volume workloads where performance and efficiency are paramount.
FPGAs: Better suited for prototyping, research, or rapidly evolving fields like diffusion-based AI where adaptability trumps raw efficiency.
Where do we go from here?
Algorithm development in AI is a double-edged sword: it drives progress but also exposes the limitations of specialized hardware like ASICs. The rise of diffusion-based LLMs underscores this tension. It challenges the efficiency and relevance of ASICs optimized for outdated paradigms.
This disruption also presents an opportunity. ASICs excel in performance and efficiency for fixed tasks but falter when algorithms shift, as with diffusion LLMs. FPGAs trade some efficiency for flexibility, making them a viable alternative in AI’s fast-changing landscape. The choice between them depends on balancing long-term optimization with the need to adapt to innovation.
By embracing adaptable solutions like FPGAs and hybrid designs, the industry can balance performance with the agility needed to thrive in an AI-native future. As algorithms continue to evolve, the fate of ASICs will hinge on their ability to adapt.
If you're a founder or an investor who has been thinking about this, I'd love to hear from you.
If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 friend who’s curious about AI: