What is Retrieval Augmented Generation

Thoughts on how Retrieval Augmented Generation can help build enterprise-ready LLM applications

Jun 22, 2023

Welcome to Infinite Curiosity, a weekly newsletter that explores the intersection of Artificial Intelligence and Startups. Tech enthusiasts across 200 countries have been reading what I write. Subscribe to this newsletter for free to receive it in your inbox every week:

Hello friends,

Large Language Models (LLMs) have seen phenomenal growth. LLMs are trained on large amounts of unstructured text data and they generate new text that has the characteristics of the text it was trained on. But there's a key problem when it comes to companies using them for their business -- Hallucination. LLMs just make stuff up! Here’s what a happy robot looks like when it’s hallucinating:

For example, a financial analyst cannot take the output of an LLM and plug it as-is into an official report. They have to fact-check it.

Another issue is that LLMs are not great when it comes to domain-specific work. If it's being used by a legal professional who works in a regulated environment, they would want the LLM to use their documents to provide an accurate answer.

There are two ways in which you can make an LLM perform tasks specific to you:

You can fine-tune the model until it gets good at answering questions about your own data. But the issue is that it might still hallucinate.
You can insert text data into the prompt as a way to provide facts. But the LLM is limited by the context length. There's a limit on how much text data you can insert into the prompt. And this limit is usually not high enough.

How do we address it? This is where Retrieval Augmented Generation comes into the picture.

What is Retrieval Augmented Generation?

Retrieval Augmented Generation (RAG) is a technique for generating text that combines two different approaches: Retrieval and Generation (Shocking! I know). Retrieval involves finding relevant information from known sources of data (e.g. documents, APIs, databases) and Generation involves creating new text from scratch.

RAG works by first retrieving relevant information from the known sources of data. This information is then converted into numerical representation known as Vector Embeddings. I've talked more about it in this post. These embeddings are then used for the Generation process.

For example, let's say you are trying to create a article about a new technical product. The Retrieval component will retrieve information about the product from a company's documents, website, and reviews. This information would then be used to generate the article.

Why bother with RAG? Why not just generate text without it?

You can certainly generate text with a generic LLM, but RAG has several advantages over it. As we've discussed, the biggest issue with LLMs is that they hallucinate. They make stuff up! Just like humans do sometimes (okay they do most of the time). When you publish an article about a product, you need to stick to facts. You can't make false claims! And you need to keep it interesting too. RAG can help with both.

It can keep the text accurate and generate better quality text than generic LLMs in this case. This is because the Retrieval component will help ensure that the text is based on actual product specs and is actually accurate.

RAG can also help us speed up the Generation process. Why? Because the Retrieval component can provide a starting point for the Generation process. You don't need to brute force it. This can save time and effort. RAG helps reduce the cost as well. Since we're limiting the Generation process to the set of retrieved documents, we can limit the amount of text that model has to deal with.

RAG hasn't been around for too long, so people are still experimenting with it. But it has the potential to revolutionize the way LLMs are used in the enterprise. It is already being used in a variety of enterprise applications.

Is RAG the elixir we've all been looking for?

Elixir is a magical potion that cures everything. RAG is not there yet! There are a set of challenges we need to be aware of:

It requires access to a large amount of text that has all the required facts
It can get expensive as you increase the size of the domain-specific dataset
The cost of maintaining a vector database (for the embeddings that are retrieved) is non-trivial
The RAG process can be difficult to evaluate
The likelihood of hallucination is lower when you use RAG, but it's not zero. You still need to verify the text that gets generated.

This is an exciting avenue that can make LLMs very useful for large companies. If you’re a founder building an AI-infused product for a specific vertical, I’d love to hear from you. My DMs are open on LinkedIn and Twitter.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 friend who’s curious about AI:

Infinite Curiosity Newsletter

Discussion about this post