Hello readers,
Welcome to the 2nd edition of Product Category Memo. The goal of this segment is to do an in-depth analysis of a specific product category within AI, ML, and Data. We’ll talk about product description, market dynamics, landscape of products, pricing mechanics, growth strategy, and the characteristics of category-leading products.
If you have a question, submit it here and I’ll get back to you with my thoughts. Subscribe to this newsletter to receive it in your inbox every week:
In this post, we’ll talk about:
What is a feature store
Who uses feature stores and what do they want
What products are competing in this market
What factors drive pricing
How these products acquire customers
How a product can win this market segment
Let’s dive in.
What is a feature store?
Before we talk about feature store, let's talk about what a feature is. A feature is a representation of data that can be used as an input to your machine learning model. You cannot just feed raw data into a model and expect it to do magic. You need to convert it into a form that can be interpreted by a machine learning model. You can technically feed raw data into a neural network, but it has mechanisms to convert it into features before building a model.
For example, let's say you have the following data about a group of people:
date of birth
location
education level
Based on this, let’s say that your task is to predict their income level. Raw data would be numbers, zip codes, and alphabets. What does a “feature” look like for this? They would be something like — "was this person born before 1985?", "does this person live in a large city?", or "does this person have a bachelors degree?". Now that we know what a feature is, let's talk about what a feature store is.
A feature store is a place to store all the features in a centralized manner. This allows everyone to access them at all times and re-use them across different projects. Without a feature store, every person on the team will redo the same work of converting raw data to features. It's expensive, time consuming, and prone to errors.
Who uses feature stores and what do they want?
The following people tend to use feature stores for their day-to-day work:
ML practitioners
Data scientists
Data engineers
Here are 7 key functionalities that the users want from a feature store product:
Be able to convert raw data into features. It should automate feature computation, backfills, and logging.
Be able to store and retrieve features quickly
Detect and surface data issues as they occur. This helps monitor the health of feature pipelines in production.
Serve features to models. It should help achieve consistency between training and serving data.
Have a centralized catalog of feature definitions and metadata. It should track feature versions and lineage as well.
Enable them to bring new features to production without a lot of engineering and ops support.
Enable users to share and reuse feature pipelines across teams.
What products are competing in this market?
The concept of a feature store was introduced by Uber in 2017. They published this article about their internal tool called Michelangelo.
The market has two groups of products: open source tools and commercial products
Here are the open source tools available:
Feast (by Gojek)
Hopsworks (by Logical Clocks)
Butterfree (by QuintoAndar)
Here are the commercial products available:
What factors drive pricing?
The pricing is based on usage and performance requirements, but it's are not very standardized across companies. Feature stores are a relatively new development in ML, so people are still figuring out how to price their products and standardize it across the board. The following factors tend to drive pricing:
Number of feature reads
Number of feature writes
Amount of storage needed
Latency requirements
SLA requirements
The pricing goes up as the users need more across each of these dimensions.
How do these products acquire customers?
The primary mechanism to acquire customers has been through open source adoption. The concept of a feature store tool came out only in 2017, so the entire sector is very nascent. And then Hopsworks came out in 2018 and Feast came out in 2019. Tecton is a major player in this field. The founders of this company are the creators of Michelangelo.
Another mechanism has been for cloud providers like Amazon and Google to provide feature stores as part of their offering. They have huge customer bases, so it's easier for them to start using this offering since they're already in the ecosystem.
How does a product win in this category?
A product can pull on the following key levers to win:
Open-source adoption: A product needs to get individual developers to adopt it. That's how a feature store product makes its way into an organization.
Community: There needs to be a strong community around a product. The strength of the open-source community will dictate how many ML practitioners adopt a new tool.
End-to-end solution: A product needs to be an end-to-end solution to penetrate large companies. If a product is a point solution, it will have a hard time penetrating large companies. What's a point solution? It's a tool that does one thing well. And then the company has to stitch together many tools together to get their work done. It leads to a lot of integration issues between tools because they may or may not play nicely with each other.
Integration capabilities: A product needs to play well with all the tools along the ML value chain. If your product doesn't integrate well, it won't be adopted by the developers.
If you’re getting value from this newsletter, consider subscribing and sharing with your friends: