Curves of Inefficiency in AI

What are curves of inefficiency, what are the key metrics, where are the opportunities for builders

Nov 06, 2024

Welcome to Infinite Curiosity, a weekly newsletter that explores the intersection of Artificial Intelligence and Startups. Tech enthusiasts across 200 countries have been reading what I write. Subscribe to this newsletter for free to receive it in your inbox every week:

AI is progressing at a breakneck pace! But at the same time, the AI tech stack is riddled with inefficiencies. People are making rapid advancements to fix these inefficiencies. This raises a question -- What opportunities exist for builders in this phase of the AI journey?

Any company that fixes an inefficiency at a reasonable price ends up owning that part of the infrastructure. That's what led me to think about the concept of Curves of Inefficiency. It provides a useful framework to think about what inefficiencies to tackle first.

Where are these inefficiencies in the AI tech stack?

The AI tech stack comprises multiple layers such as data processing, training models, hardware, software orchestration, deployment, and more. Each of these components has unique inefficiencies: slow model training, high energy consumption, hardware bottlenecks, and networking challenges.

Mapping these inefficiencies through data-driven curves allows us to identify where the biggest gains are possible. And the truth is that there's a lot of low-hanging fruit just waiting for someone bold enough to tackle it.

What is a "Curve of Inefficiency"?

A "Curve of Inefficiency" is a graphical representation of a key performance metric (such as time, cost, or energy) against an input variable (such as model complexity, hardware availability, or workload size).

For instance, we could map the speed of training against model size or the number of GPUs used. These curves can highlight the diminishing returns or bottlenecks that are inherent in each part of the stack. By understanding the shape of these curves, we can start to spot where the inefficiencies grow most dramatically. We can identify which segments of the stack are the most limiting and then represent the largest opportunities for improvement. If you think about it, these curves are essentially maps to startup gold mines.

The slope of the curve at any point is directly proportional to the size of the opportunity in that region.

Another way to say it would be -- for the regions of the curve where the inefficiencies are the highest, the opportunity for builders would be the strongest.

For example, let's consider plotting energy consumption on the Y axis and model complexity on the X axis. If you increase model complexity by a little bit and that results in a lot more energy consumption, then it means the slope is very high. This is inefficient. If you build something that can reduce the amount of incremental energy consumed for the same level of increased model complexity, then you have something useful on your hand. You are reducing the slope in this part of the curve.

Take OpenAI's GPT-4o for example. Training such a massive model requires an enormous amount of compute. And even though the model was successfully trained, the training efficiency was far from optimal. This highlights an opportunity for better distributed training solutions that could achieve similar results with fewer resources.

What are the key metrics that define inefficiency?

To quantify inefficiencies in the AI stack, we need to look at a comprehensive list of metrics. Each of these could potentially become the basis for a startup's mission. The speed of training is a critical factor: how quickly can a model converge? Training speed can bottleneck innovation, especially as datasets grow larger and architectures become more complex.

Hardware Accelerators: Startups can focus on developing better distributed training algorithms or new hardware accelerators specifically designed to speed up training. For instance, Cerebras Systems developed the Cerebras Wafer-Scale Engine to tackle the inefficiency of model training by increasing the processing power available in a single chip.

Inference Speed: This is important for real-time applications where every millisecond counts. A company that can shave milliseconds off inference times for large-scale applications could dominate the corresponding market. Let's take Tesla's FSD (Full Self-Driving) computer as an example. It was designed specifically to reduce inference time, making real-time decisions on the road possible.

Power Consumption: This is another significant inefficiency. AI workloads can consume vast amounts of power. And measuring energy inefficiency is crucial. Google's use of TPUs (Tensor Processing Units) in their data centers highlights the efforts that are being undertaken to bring down power consumption. How? By creating hardware specifically optimized for AI computations. Startups that design ultra-efficient neural network architectures or specialized chips to reduce power usage could unlock huge savings.

FLOPS (Floating Point Operations per Second): This measures the sheer volume of operations. It can illustrate the computational efficiency relative to the desired performance. There is a clear need for better optimization libraries or novel precision techniques that reduce FLOPS while maintaining accuracy. For instance, Nvidia's mixed-precision training technology helps reduce the number of floating-point operations. This makes training faster and more efficient without compromising model quality.

Chip Manufacturing Speed: This is another key bottleneck. AI growth is constrained by the speed of hardware development. Watch this 3-min video clip where Larry Ellison (cofounder of Oracle) talks about how he and Elon Musk had dinner with Jensen Huang to beg for more GPUs. And if chip production can't keep pace with algorithmic advancements, then we have a bottleneck to deal with. There's a need to streamline semiconductor manufacturing processes or even working on flexible hardware that can adapt to new AI requirements without long lead times. Consider the recent chip shortages that impacted industries across the globe, stalling AI research and deployment projects. This demonstrated just how vulnerable the AI ecosystem is to delays in chip manufacturing.

Networking: These GPUs need to be networked together and be effectively used to run AI workloads. This is even more relevant when you have distributed GPU clusters because it involves significant data movement. And the latency can hinder scaling. There is an opportunity to develop better GPU interconnect solutions, whether it's through hardware advancements or more efficient networking protocols. Nvidia's NVLink is a real-world example that addresses this problem by enabling faster communication between GPUs.

Memory Bandwidth: This is another crucial area. Moving data in and out of memory is often a major bottleneck in AI systems. A startup that can solve memory access issues—either through hardware solutions or new memory architectures—could revolutionize how AI workloads are handled. Graphcore’s IPU (Intelligence Processing Unit) is a step in this direction, with its architecture designed to maximize memory bandwidth for AI workloads.

Data Preprocessing: The data preprocessing overheads also contribute significantly to inefficiencies. Getting data into the right shape and format is cumbersome. And often a significant portion of time is wasted here. Automated data preprocessing tools that reduce human involvement or intelligently optimize data pipelines could save organizations enormous amounts of time. Companies are already pushing the envelope here by providing tools to automate the labeling and data preparation process. This allows data scientists to focus more on model innovation rather than the grunt work of data prep.

Model Compression: Large models are inefficient to deploy due to their size. And there is a need to focus on model compression techniques such as pruning, quantization, or even dynamic architectures that adjust complexity based on context. DeepMind's work with AlphaZero showed that despite the enormous computational power initially required, pruning techniques helped make the model more efficient without sacrificing performance. This opened the door for using similar approaches across other domains.

Hyperparameter Tuning: This is an expensive and often trial-and-error process. If you can automate or intelligently guide hyperparameter search using methods like Bayesian optimization or reinforcement learning, it can help practitioners achieve better results faster. Google's AutoML is a notable example that has shown how automating hyperparameter tuning can provide access to powerful AI models without requiring extensive manual tuning expertise.

Orchestration: Deployment and orchestration remain challenging across hybrid cloud environments. There's a need to optimize the orchestration of AI workloads. How? By ensuring high availability and minimizing latency for critical applications. Kubernetes was initially designed for container orchestration. And people have tried to adapt it to handle the complexities of AI model deployment, but it needs a significant amount of work. It demonstrates that general-purpose tools need a significant overhaul to manage AI workloads effectively.

Data Labeling: High-quality labeled data is the backbone of successful AI models, but labeling is resource-intensive and slow. Tools that can automate labeling, reduce reliance on human labelers, or improve data quality could help address one of the most costly parts of AI development. Scale AI is a prominent example of a company that has managed to streamline the labeling process by using a mix of human and machine labeling techniques, thereby significantly reducing the time and cost associated with data preparation.

How can builders find opportunities along these curves?

Finding opportunities hinges on identifying the most inefficient points on these curves. The goal is to turn steep rises into more manageable slopes. You basically need to flatten the inefficiencies.

Analyzing knee points is crucial. The "knee" of a curve is where the performance starts to see diminishing returns relative to resource investment. And this often signals a key inefficiency. These are the moments to explore alternative paths e.g. more effective architectures, compression techniques, or new hardware configurations.

Instead of focusing solely on individual metrics, you can should take a stack-wide view. For example, improving hardware efficiency may reduce power consumption. Startups that can address multiple inefficiencies at once can capture a larger piece of the value chain. Systemic optimization is also a powerful way to address inefficiencies. Enhancing communication between software and hardware could alleviate networking issues or bottlenecks caused by poorly timed GPU coordination. This is fertile ground for startups focusing on more intelligent orchestration layers or systems that self-optimize based on workload.

The curves of inefficiency can act as blueprints for innovation. Startups that identify the steepest parts of these curves and develop targeted products will find themselves at the cutting edge.

If you're a founder or an investor who has been thinking about this, I'd love to hear from you.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 friend who’s curious about AI:

Infinite Curiosity Newsletter

Discussion about this post