Product Category Memo #4: Software Products That Speed Up ML On Hardware
In-depth analysis of products that can port, optimize, and deploy ML on different hardware platforms
Hello friends,
Welcome to the 4th edition of Product Category Memo. The goal of this segment is to do an in-depth analysis of a specific product category in AI, ML, and Data. We’ll talk about product description, what users expect from this product, landscape of products, pricing mechanics, and growth strategy.
In this post, we’ll talk about:
Why do we need software products to make ML work on different hardware platforms
Who needs this product and what do they want
What products are competing in this market
What factors drive pricing
How these products acquire customers
This post is not about hardware. This post is about software that can make ML work on different hardware platforms.
If you have a question, submit it here and I’ll get back to you with my thoughts. Subscribe to this newsletter to receive it in your inbox every week:
I wanted to see what a robot would look like working on hardware, so I asked DALL-E to do it. Let’s dive in.
Why do we need software products to make ML work on different hardware platforms?
The problem is four-fold:
1. There been a rapid growth of ML-infused use cases e.g. recognizing images, understanding text, recommendation engines
2. For each use case, people are building many different machine learning models e.g. using different training data, using different hyperparameters, using different precision/recall metrics, using domain-specific knowledge
3. For each model, there are many machine learning frameworks that can be used e.g. TensorFlow, PyTorch
4. For each framework, there are many hardware targets where these models can be deployed e.g. GPUs, CPUs, mobile phones, edge devices
If you want to deploy your ML models built using PyTorch on Nvidia GPUs, you need to use Nvidia's custom software stack along with the right libraries and frameworks. Now let's say someone else wants to do the deploy the exact same model on Nvidia GPUs, but they used TensorFlow to build it. Now this becomes an entirely new project.
Because of this fragmentation, the ML models are not easily portable. You cannot just copy-paste your model from one platform to the next and expect it to work. Even if it works, the performance drops significantly as you go from one platform to the next.
As you can imagine, this quickly becomes unmanageable as we look at all the different tools and hardware targets. How do we manage this? This is where we need a software product that can handle this job for us.
Who needs this product and what do they want?
This product is used by ML researchers, ML engineers, and hardware engineers. In an ideal world, they need a single software product that can handle the entire workload:
It should allow them to build an ML model using their own recipe. This can consists of frameworks, algorithms, and libraries.
Once they create a recipe, the product should take it as input and optimize it. This helps them speed up the training process while reducing the cost.
Once the recipe produces an ML model, it should take that model as input and create various versions that can work on all the different target platforms.
Once the model has been ported to all the platforms, it should optimize those versions so that the performance remains good on each platform.
It's not an easy problem to solve. And there's a huge demand for software products that can do this job.
What products are competing in this market?
There's an open source Apache project (Apache TVM) dedicated to this effort. There are a few different companies building products in this market. Each company has its own approach to delivering a working solution:
What factors drive pricing?
The pricing is driven by the following metrics:
Number of runtime optimizations
Number of hardware targets
Number of frameworks included
Provision to train and manage the models
Provision to benchmark and optimize the performance
Availability of a model catalog
Deployment tools
Access to APIs
SLA terms
How do these products acquire customers?
These products can use the following ways to acquire customers:
Open-source adoption: This has been the most popular channel in this category. Products aim to get individual developers to adopt it by providing an open source package. That's how a product makes its way into an organization.
Community: There needs to be a strong developer community around a product. The strength of this community will dictate how many ML practitioners adopt a new tool.
Integration capabilities: A product needs to play well with all the tools along the ML value chain. If your product doesn't integrate well, it won't be adopted by the developers.
End-to-end solution: A product needs to be an end-to-end solution to penetrate large companies. Point solutions have a hard time penetrating large companies. These companies won't stitch together a bunch of tools themselves to get their work done. They will end up going with a company that provides the whole solution.
Level of compatibility: Products in this category need to be compatible with a range of hardware platforms and ML frameworks. And they need to keep themselves updated as those platforms are updated. It takes a good amount of effort to keep track of all the companies updating their hardware, libraries, frameworks, and drivers.
If you are getting value from this newsletter, consider subscribing and sharing with your friends: