Retrieval Augmented Generation

Large Language Models (LLMs), albeit powerful in many ways, have inherent limitations in generating verifiable responses grounded in real-world data. Their knowledge is limited to their training data. This makes them prone to nonsensical, biased, out-of-context, and sometimes outdated responses. RAG introduces fact retrieval from external sources to LLMs to mitigate this. 

What is RAG?

Retrieval Augmented Generation (RAG) is a technique to optimize an LLM’s output to reference an authoritative or pre-determined knowledge base before generating a response. This knowledge base could be a reputable online source like NASA, the CDC, or a company’s internal database. RAG aims to connect LLMs to real-world up-to-date factual information to make their output more reliable, trustworthy, and accurate. 

Without RAG, a user will provide a text prompt to an LLM and the model will return a response based on its training data. While this is fine for general use cases, it becomes a problem when:

  • You need a specific response from your LLM-powered solution based on up-to-date data. Training data can get outdated, causing an LLM to give you an incorrect response. For instance, in 2023, astronomers discovered that Saturn has more moons than Jupiter. If a model is trained on 2022 data, it will give you the wrong answer. 

  • You need a specific response from the LLM, grounded in your enterprise data. 

So how does RAG mitigate this? It adds a fact retrieval component to this process. The user’s input will be used to pull information from a relevant source (trusted online sources or enterprise data). This information and the user’s prompt will be fed to the LLM. Finally, the LLM will use the two to construct an accurate response.

How Retrieval Augmented Generation Works

The implementation of RAG in the workflow of LLMs involves 6 key steps. Another AI technique, vector embeddings, plays a crucial role in this workflow. Embeddings are important mostly to retrieve relevant information based on document-to-query similarity. It helps LLMs to identify relevant information from your chosen source, whether it is a public online source or a company’s internal database. That said, here is how RAG works.

1: Preparing and storing data in a vector database (Pre-step)

Before this workflow begins, you will prepare the data the LLM will use as a reference. You can get this data from available sources through APIs, for example. Alternatively, you can also get it from the company’s internal database. 

To do this, an embedding model will transform this information into vector representations. Simply put, textual data will be converted into numerical data, which can be easily searched, matched, and retrieved from the vector database. A type of AI model called an embedding model is used to complete this process. 

2: User provides input

A user will provide a prompt with a question or instruction. Take a company’s AI-powered chatbot as an example, an employee can ask a question like “How much annual leave do I have left?”. The following steps highlight what will happen in the background to generate an accurate response. 

3: Retrieving relevant information

The chatbot will parse this question and forward it to the embedding model. The query will be converted into its vector representation which will be used to perform a semantic search in the vector database. Once relevant information is identified, the system will retrieve this information from the knowledge base. 

4: Augmented prompt generation

The retrieved information is then combined (augmented) with the user’s original prompt and fed to the LLM. This creates a richer, more informative context for the LLM to draw upon during generation. 

5: Response generation

The LLM leverages knowledge gained from training data along with the prompt augmented with internal data to generate a response. Based on this combined knowledge, the LLM incorporates the retrieved facts and addresses the user’s intent. 

6: Source accreditation

Although sometimes optional, RAG implementations enable the LLM to cite the sources of the retrieved information within its response. This could be links to specific paragraphs or footnotes similar to those on research papers. 

The Benefits of RAG

The implementation of RAG in LLMs’ AI content generation brings several benefits including the following. 

Improved factual accuracy

RAG grounds an LLM’s output in real-world factual information. It enables LLMs to access and leverage information from trusted online sources or a company’s knowledge base. This ensures the outputs it generates are factually accurate and up-to-date, making the LLM less prone to hallucinations. 

More reliable output

LLMs are trained on massive amounts of data, but this data might not always be accurate or complete. If an LLM encounters a question outside its training data, it could make up information to fill in the gaps. This also happens when a user’s prompt lacks sufficient context. 

RAG helps LLMs to generate reliable outputs this way:

  • These models can check for and retrieve facts from reliable sources before responding. 

  • After retrieving facts, it augments the user’s prompt with this information to give the LLM rich context to understand the user’s intent.

It provides some insight into the LLM’s reasoning

As LLMs are increasingly used in critical tasks, it is important to understand their reasoning to ensure they are making reliable and trustworthy decisions. Although these models are complex and, therefore, not easily interpretable, implementing RAG can get you halfway to understanding how they arrive at their outputs.

With RAG, LLMs can cite the sources of information their output is derived from. Users can verify the information presented and understand the reasoning behind an LLM’s response. 

Ability to use a variety of knowledge sources

You can use RAG to connect practically any LLM to any external source of information. This includes global trusted sources like WHO or a company’s internal database. This makes LLMs useful for a wide variety of specific applications. 

On top of all this, RAG is a cost-effective way to ensure an LLM’s output remains relevant, accurate, and useful in various contexts. You don’t have to spend massive computational resources to retrain or finetune the LLM to achieve this.

The Challenges of RAG Implementation

Although RAG is intended to make LLM outputs more reliable, relevant, and factually accurate, reaping these benefits is not a given. Several challenges can prevent this, and understanding them is the first step to addressing them proactively and leveraging RAG effectively. So, here are some challenges facing RAG implementation. 

Outdated or biased information in knowledge bases

The LLM’s output is only as good as the knowledge base that gives it facts and context. If this source is outdated or contains biased information, the quality of the output will be low. It will likely generate a biased response or one that is no longer correct because the information is outdated.

It depends on the success of its retrieval system

RAG depends on its retrieval system to find accurate information, relevant to the user’s query and deliver it efficiently for the LLM to use. Failure to do this renders RAG ineffective. It might capture factually correct information, but if it is irrelevant or misses the context, it won’t be helpful to the user. 

Imagine an LLM-powered app that helps potential customers discover new places to dine. Let’s say a user asks “What is the best pizza place in town?”, and the RAG system retrieves a list of the best pizza places for the entire country. 

Although this is factually correct information, the response won’t be helpful to the user. The system failed to retrieve the best pizza places for that town. 

Complexity in implementation

Connecting the LLM to a knowledge base can be complicated. For instance, you might need to build a SaaS app’s endpoints to access specific data and consistently add it to the LLM. Or build a scraper to recurringly copy the text on a site and add it to the LLM. In any case, this process is complex and requires a lot of technical resources.

Use Case Example: A Customer Support Generative AI Chatbot for Banks

Suppose a bank has a customer service chatbot powered by an LLM. A customer might ask it a question about their eligibility for a given loan product. The chatbot might struggle with the specifics if it relies on its training data. However, with RAG, the chatbot can retrieve relevant, up-to-date information on the loan product. 

It will combine this with the customer’s creditworthiness and bank policies retrieved from the bank’s internal database to provide an accurate answer to the customer. 

Summary

RAG allows LLM-powered applications to retrieve real-world up-to-date information and augment it to a user’s prompt before generating a response. This improves the accuracy, reliability, and trustworthiness of their output. RAG reduces an LLM’s tendency to make up information to fill in gaps. 

This opens up LLM-powered apps to various use cases beyond what the model was trained on. As a product of its benefits, RAG implementation is becoming a standard practice for generative AI-powered applications. 


Previous
Previous

Build your own Google Cloud Expert from scratch

Next
Next

Optimize your Gmail experience (Tips & Tricks)