Hello readers,
I’m launching a new weekly segment today called Product Category Memo.
The goal of this segment is to do a deep dive on a specific product category within AI, ML, and Data. We’ll talk about product description, market dynamics, landscape of products, pricing mechanics, growth strategy, and the characteristics of category-leading products.
If you have a question, submit it here and I’ll get back to you with my thoughts. Subscribe to this newsletter to receive it in your inbox every week:
In this post, we’ll talk about:
What is a data science notebook
Who uses notebooks and what do they want
What products are competing in this market
What factors drive pricing
How these products acquire customers
How a product can win this market segment
Let’s get into it.
What is a data science notebook?
A notebook is an interactive computing environment where you can write and execute code. Data science notebooks are oriented towards data exploration work. Data practitioners can explore the data, experiment with various techniques, and visualize the results.
Who uses notebooks and what do they want?
There are 54 million data practitioners around the world. They engage in data-related work in their day-to-day life. It covers jobs such as:
data scientists
data analysts
business analysts
Data practitioners have to explore data from a variety of sources on a daily basis.
Here are 6 criteria that a product must meet in order to be useful:
No setup required
Nobody wants to spend time setting up any kind of environment. It's 2022, so everything should be hosted in the cloud and just work out of the box.
No management of libraries
Nobody wants to keep track of what version of libraries have been installed. This can and should be automated. They want environments that are up to date and consistent across their team.
Easy to load data
Getting data into your notebook can be a hassle. Modern practitioners expect this to be straightforward. They need good control over their data. And they should be able to control who has access to this data.
Easy to collaborate with teammates
Data practitioners need to collaborate with their teammates. The notebook product should be inherently collaborative such that sharing is a breeze. The early versions of these notebook products didn't have collaboration features. Google Colab took this opportunity to make sharing a core part of its product. Jupyter came up with JupyterHub enable collaboration. All modern notebook products are collaborative by default.
Easy to visualize
Data scientists need to visualize their output frequently for themselves as well as for others. They want the ability to share their findings with non-technical teammates. They want their notebook product to make visualization easy.
Should support multiple languages
Let's say there are 3 people in a team -- one uses SQL, another uses Python, and the third person uses R. Should they use three different notebook products? Not a good idea. A notebook product should support multiple languages. Now you can use the same tool to write a SQL query, your teammate can explore the data in Python, and then another teammate can use it tool to explore the data in R. Another amazing feature would be if you can do all of it in a single notebook instance. The notebook product needs to be smart enough to accommodate multiple languages in a single notebook instance.
What products are competing in this market?
Just like how Jupiter is the biggest planet in our solar system, Jupyter is the biggest product in the notebook ecosystem. It's one of the first notebook products to build a real user base and it continues to be in the lead. There are 9.2 million public Jupyter notebooks on GitHub today.
Jupyter is open source. Many companies have built notebook products with Jupyter as their base. I'm going to make another astrophysics joke. Jupyter's gravity is strong, so there are products orbiting around this.
Here's a list of notebook products that have Jupyter as their base:
Then there are a set of notebook products that seek to innovate on the user experience. To gain any type of traction, a product must be compatible with Jupyter (or else users may not adapt your product). So these products allow you to import your Jupyter notebooks. But also provide key features that make the experience better.
Here’s the list of products:
Nextjournal (notebook for reproducible research)
Polynote (mixing multiple languages in one notebook)
Then there are products that aim to provide more than just notebooks. The key issue with these products is that the notebook capabilities end up being a feature of a larger product, so the user devotion tends to dilute down.
Here’s the list:
What factors drive pricing?
The pricing is not as simple as charging per user. It's more nuanced than that. Here are the factors that affect the pricing of a notebook product:
Number of people using it: How many users are using this product?
Number of projects: How many active projects do you have?
Version history: How much history do you want to store?
Compute power: Do you want larger virtual machines to run your code?
Compute time: How long do you want to use those machines for?
Additional services: Scheduling of tasks, containers, single sign on, SLA
How do these products acquire customers?
Bottom up adoption has been the primary GTM motion here. You get individual users and they will bring in their teammates. Due to the inherently collaborative nature of this product, the potential for growth is high. That's what companies like Deepnote and Hex leveraged to grow fast.
This subsector has attracted top notch investors like A16Z, Accel, Insight, and others. It would be interesting to see how these products develop to capture more of the enterprise budget as they go upmarket.
How does a product win in this category?
A product can pull on the following key levers to win:
Bottom-up adoption: A product needs to get individual users to adopt. That's how a notebook product makes its way into an organization.
Community: There needs to be a strong community around a product. Bottom-up tools tend to benefit from having a community.
Entry point: Products that catch users early in their career lifecycle tend to have staying power. Habits last a long time. If a product is part of a data practitioner's habits when they're first formed, it will stay there for a long time. Deepnote has partnered with many universities and gives the product away for free to students. This is a great user acquisition strategy.
Design: Slick design is almost a prerequisite for modern products. A product needs to be easy on the eye. Plus it needs to be intuitive. So many data products tend to miss the mark here. Deepnote has done a great job on this front.