GPU Utilization Gaps and the Resulting Startup Opportunities

Factors influencing GPU utilization, opportunities for startups to address the gaps

Apr 15, 2024

Welcome to Infinite Curiosity, a weekly newsletter that explores the intersection of Artificial Intelligence and Startups. Tech enthusiasts across 200 countries have been reading what I write. Subscribe to this newsletter for free to receive it in your inbox every week:

In the world of consulting, there's a concept called the Utilization Rate (U-rate). It serves as a key metric for consultants. Consultants are employees who are paid fixed salaries by the firm. And then the firm sells billable hours to clients (at a markup of course). So every hour the consultant is sitting idle, it costs the firm money because they have already guaranteed their salary. U-rate represents the ratio of billable to available hours. 100% U-rate would mean the consultant has sold all their billable hours, which is great for the firm.

Let's take this concept and apply it to GPU utilization rate. It's not apples-to-apples, but roll with me for a minute here. When a company buys GPUs, they have already made the capex investment. GPU is an asset that needs to be put to work. In an ideal world, a GPU's utilization rate is 100%. It's never sitting idle. But various people have run different types of test to figure out the average GPU utilization rate in the market. It's somewhere between 20-50% depending on who you ask. That's pretty low!

If a consultant is underutilized, the firm has to cover the costs. Similarly if a GPUs is underutilized, it indicates that the company is covering the costs when it's not in use. It results in higher operational costs and lower margins.

There are many gaps in the GPU utilization infrastructure. Even the world's best tech companies have sub 50% GPU utilization rate. But at the same time, it's a fairly consolidated industry. The number of potential customers who can benefit from a startup building GPU utilization platform is small. Big tech companies implement their own technologies to maximize GPU utilization.

Startups can look at all the factors influencing the GPU utilization rate. And figure out what needs to be built. They can bundle the right subgroup of features in a product and provide that as an software offering.

What factors influence the GPU utilization rate?

Here is a list of factors that influence the GPU utilization rate to varying degrees:

Algorithms require handling vast data volumes. And it needs GPUs that are capable of efficient multi-GPU training.
Tasks like processing high-resolution medical images or videos demand GPUs with extensive memory.
Developmental stages might not need high-performance GPUs. But training complex models does require them.
The capability to interconnect GPUs facilitates multi-GPU training. This is a feature often missing in consumer-grade GPUs. Ensuring robust data transfer paths between GPUs and CPUs keeps GPUs actively engaged.
Implementing hardware like NVIDIA Grace (CPU) or BlueField-3 (DPU) can significantly boost data center energy efficiency.
Effective cooling strategies are important for maximizing energy efficiency and GPU utility. Utilizing state-of-the-art GPUs that offer better energy efficiency and enhanced interconnects can significantly improve utilization.
You need GPU accelerated libraries that can work efficiently. Support in frameworks like PyTorch and TensorFlow is pivotal.
Techniques like containerization and detailed scheduling improve the control and efficiency of resource allocation.
By employing tools such as Nvidia's DCGM, you can anticipate GPU demands and ensure optimal allocation and utilization. This is similar to consulting firms forecasting project needs for resource planning. Companies can dynamically assign GPUs to tasks by utilizing platforms like Kubernetes.
Regular audits with tools like GPU Profiler can help identify underperforming applications and identify opportunities for optimization.
GPU virtualization tools like NVIDIA GRID allow the division of a single GPU among multiple tasks or users.
You need to ensure that the GPU hardware matches the workload requirements in terms of memory, interconnects, and performance. This is crucial in maximizing the utilization rate.
Designing models that fully utilize the parallel processing capabilities of GPUs maximizes their efficiency.

What are the viable startup opportunities here?

In the above list, there are many things that startups cannot easily do such as (they're doable, but it’s just very capital intensive): Building new GPUs, interconnecting GPUs, building cooling technologies, and energy management technologies.

This leads us to the 3 gaps that can be addressed by startups:

GPU virtualization product: Before we discuss it, let's talk about just virtualization. It's is a key piece of technology that enabled large companies to utilize their resources effectively. VMware started in 1998 doing this and were acquired for $69 Billion in 2022. So it's clearly a giant business if you do it right. Now some company has to figure out the GPU and AI equivalent of what VMware did.

Workload router product: If you use the most expensive GPU for every single workload, you'll go bankrupt. If the workload is light, it should be routed to a smaller machine. If there are many workloads, they need to be scheduled properly. In the real world, you need to handle this quickly (often in real time). So we cannot hope to do it manually. A workload router product will help solve this. It understands what's available in your infrastructure, understands what the incoming workload is, and then efficiently routes it so that you spend the least amount of money executing that workload.

Verticalized AI chips: Nvidia's GPUs can do a lot of tasks really well. But if you have a specific task / use case, then verticalized AI chips could be an order of magnitude more efficient. You can build a new AI chip that's super fast and is highly efficient with respect to something really specific e.g. Groq built a chip that's super fast when it comes to running inference for text-related applications. It's an exceptional case. You need deep expertise, lot of capital, and lot of time to ship your first version. Most startups may not have this luxury. But if you get it right, it's going to be an enormous win.

If you're a founder or an investor who has been thinking about this, I'd love to hear from you. I’m at prateek at moxxie dot vc and our fund Moxxie Ventures leads seed rounds.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 friend who’s curious about AI:

Infinite Curiosity Newsletter

Discussion about this post