Discover more from Infinite Curiosity
14 Types Of Products You Can Build For Machine Learning Practitioners
List of tasks that appear in an ML practitioner's day-to-day. Insights on how to build products that perform those tasks.
Hey reader, welcome to the 💥 free edition 💥 of my weekly newsletter. I write about ML concepts, how to build ML products, and how to thrive at work. You can learn more about me here. Feel free to send me your questions and I’m happy to offer my thoughts. Subscribe to this newsletter to receive it in your inbox every week.
The machine learning economy is rapidly expanding. Many practitioners are entering this field, which means there are opportunities to build products for this ever growing community. This post is for people who want to build a product for ML practitioners. It has a list of tasks you can take off their hands. As a long time ML practitioner, I've had experience doing these tasks manually as well as doing these tasks using a product.
This post is not about applications of ML. You can use the below mentioned list to build many types of applications. For example, let's say you have a product that labels raw data. You can use this product to build a finance app, healthcare app, or an ecommerce app. The goal here is to specifically discuss the tasks that appear in an ML practitioner's day-to-day life. And how to build products that can perform those tasks.
Task #1: Exploring data collaboratively with your team
This is usually the first step in the process of building a data product. Data comes in and you need to explore it to find out what's happening. This task includes writing code in a notebook and sharing it with your teammates. It's meant for people whose work lives in notebooks. Hosted notebooks enable you to just open up the web based platform and start writing code. You don't have to worry about installation, dependencies, or whether it will run on your teammate's laptop.
Examples: Jupyter, Google Colab, Hex, Deepnote
Task #2: Labeling the data
If you're an ML practitioner, you need labeled data to build your models. If the dataset has millions of data points, labeling it is a huge undertaking. Converting raw data to labeled data is a critical step in all machine learning work. ML algorithms need ground truth to parse the data and build the models. Data labeling products take data from the users as the input and return labeled data as the output.
Examples: Scale AI, Labelbox, Appen
Task #3: Building shareable data apps
Let's say you're experimenting with ecommerce transaction data and want the marketing team to have access to the live results. You've written the code, but they can't run your Python code. They just want to click a link and access the output graph via their browser. But you don't want to embark on a new project to build a separate app for them. How do you handle it? You need a product that can convert your code into a shareable data app. You can collaborate with your ML team to write the code and then use this product to convert that code into a live web app. The marketing team can just click on a link and see the output. This product helps with building shareable web apps directly from your ML scripts.
Examples: Streamlit Cloud, Dash, Shiny, Voila
Task #4: Storing and retrieving features
You have a lot of data. You keep experimenting with different models, which means you have to extract features. You don't want to repeat this work because it's computationally expensive. If the features exist for a particular data point, you just want to use that versus extracting the features again. You need a product that can store these features in a library and serve them to you as needed. You can retrieve historical features to train models. You can use the latest available features for inference.
Examples: Tecton, Feast
Task #5: Building and deploying ML models
Once you label your data and build notebooks, you need now need to build and deploy the ML model into production. You need a product that can help with experiment tracking, data versioning, and model management.
Examples: Weights & Biases, Comet ML, Fiddler AI, Cortex, BentoML
Task #6: Monitoring ML models in production
Once the models are deployed, you need to continuously monitor them to make sure they maintain their performance. You need a product that helps ML teams monitor their models that are in production. This includes speed, accuracy, reliability, drift, retraining, and more.
Examples: Arize, Whylabs
Task #7: Managing the entire ML lifecycle
This product combines multiple functionalities such as data exploration, building models, deployment, and monitoring into a single end-to-end platform. Your entire ML lifecycle can live on these products. The advantage is that it's all in the same platform. The disadvantage is that you're now limited to what this single platform can offer.
Examples: Amazon SageMaker, Azure ML, Google Vertex AI, Databricks, DataRobot, Dataiku, H2O.ai, Abacus AI
Task #8: Understanding image/video input
Let's say you are building an mobile ecommerce app that lets users take a picture of any item they want to buy. Your app needs to recognize the content of that image. You don't need to reinvent the wheel by building an image recognition system from scratch. You need a product with an API that can be called from within your application. You can just send an image to this API service and get the result back immediately.
Examples: Google Cloud Vision API, Amazon Rekognition, Clarifai
Task #9: Understanding text input
Let's say you are building an customer support app that lets users chat with support agents. Your app needs to understand the text content so that it can connect the customer to the right support agent. You don't need to build a text analysis system from scratch. You need a product with an API that can be called from within an application. You can just send the text to this API service and get the result back.
Examples: Hugging Face, Amazon Comprehend, Google Natural Language API
Task #10: Understanding speech input
Let's say you are building a mobile app that lets users use their voice as input. Your app needs to recognize the speech, understand what they're saying, and then take the right action. You don't need to build a speech recognition system from scratch. You need a product with an API that can be called from within an application. You can just send the speech input to this API service and get the result back.
Examples: Deepgram, Cogito, Google Speech-to-Text
Task #11: Generating synthetic training data
To build a good ML model, you need a lot of training data. There are many situations where there isn't enough training data available. That's where a product likes this comes into the picture. This product generates synthetic data that can be used to train ML models.
Examples: Parallel Domain, AI.Reverie, Datagen
Task #12: Generating content
Let's say you're creating training videos for companies and want to customize it to their context. You want a system that can read the text, customize it, and then create human sounding audio. You need a product that generates content using AI for use cases such as education, sales, gaming, customer support, and more.
Examples: DeepBrainAI, Neosapience, Soul Machines
Task #13: Reskilling and upskilling
Many people are entering machine learning and data science from other fields. What can they use to reskill themselves? Or let's say that a data scientist wants to learn more about their field. What can they use to upskill themselves? You need a product that has educational content. ML and data practitioners can use this product to reskill/upskill themselves. It can either aggregate content from ML experts or create its own content library using an inhouse team.
Examples: DataCamp, Coursera, Pluralsight
Task #14: Building reputation and finding jobs
As an ML professional, you need to be good at finding jobs. This becomes easier if you have a social standing in the broader ML community. How do you do that? By participating in participating in online ML communities and competitions. This is something a potential employer can look at to gauge you.
For example, let's say if you're an active member on an ML community and you've won ML competitions. When you show up for job interviews, you immediately have an upper hand. You don't need to worry too much about solving those puzzle problems. You've already built a reputation as a competent ML professional.
Products in this category enable practitioners to participate in online communities. They have some kind of metric that quantifies how good they are e.g. number of contests won, number of followers, a score that rewards users for posting useful content.
Examples: Kaggle, LinkedIn, Slack communities, Discord communities, Reddit
Where to go from here
The goal of this post is to show the tasks that appear in an ML practitioner's life. If you're a builder, you can talk to users and find out the status quo across their tasks. See what they need help with and build a product that works for them.
🎙 Infinite ML podcast
This podcast features conversations with amazing builders and practitioners in machine learning. It now has listeners in 250 cities across 50 countries. Subscribe to the podcast below and let me know what you think:
🎧 Apple Podcasts
🎧 Google Podcasts
🔍 ML Job Opportunities
Check out this job board for the open roles in Machine Learning.
💁🏻♀️ 💁🏻♂️ How would you rate this week’s newsletter?
You can rate this newsletter to let me know what you think. Your feedback will help make it better.