Evaluating your RAG Application using RAGAS | In Easy 3 Steps
Evaluations are meant for betterment! (amélioration)
Hey folks! I will take through understanding and evaluations of a Retrieval Augmented Generation (RAG) Systems
Need for RAG Evaluation
In the world of LLMs and Chatbots, hallucination is the most common disease we are fighting against. Haullucinations is usually handled by two frequent technique;
- Fine-Tuning for Specific Task
- Retrieval Augmented Generation
Out of these two option, RAG System are more popular among the individuals.
However, there are quite a lot of options to choose while building an RAG Application as illustrated in the fig.1
As you can see in the figure, there are lot of options available to choose from while building your RAG app. However, it is more crucial to choose the one that fits your optimal needs.
Here, comes the awesome framework — RAGAS: Automated Evaluation of Retrieval Augmented Generation to evaluate the RAG based apps. This focuses on Metric Driven Development (MDD) to improve the performance of the RAG Apps. You can read more about the framework — here.
Alright, Come on! Let’s get our hands dirty!!!
IMPLEMENTATION
1. Install and Import Packages
(NOTE: We’ll use Open AI’s GPT-4 to evaluate the prepared data, make sure you are ready with your Open AI Api Key)
Install the packages using your favourite package manager. Here, I am using PIP to install and manage the dependencies.
pip install -U -q ragas tqdm datasets
Import the installed packages.
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_relevancy,
answer_correctness,
answer_similarity
)
from ragas.metrics.critique import harmfulness
from ragas import evaluate
I assume, that you already have the data to evaluate, if not kindly feel to use the below sample data. ( Optional )
git clone https://github.com/mahimairaja/sample_ragas_dataset.git
cd sample_ragas_dataset
2. Setup API Keys and Load Data
Using the copied API Key from the Open AI Platform Dashboard setup the api key environmental variables. Here, I am passing the variable, through the colab secrets. So, before running the cell, make sure, you have assigned the secret variable with the api key value.
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
Here, I am loading the data from a json file.
from datasets import load_dataset
ragas_dataset = load_dataset('json', data_files='data.json')
data = ragas_dataset['train']
3. Evaluate and Visualize
Using the imported metrics from ragas, evaluate the dataset with the columns; question, answer, context and ground_truth.
result = evaluate(
data,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_relevancy,
answer_correctness,
answer_similarity
],
raise_exceptions=False
)
print(result)
The result, I got for evaluating the sample dataset is as below:
{
'context_precision': 0.9000,
'faithfulness': 1.0000,
'answer_relevancy': 0.9245,
'context_recall': 1.0000,
'context_relevancy': 0.1061,
'answer_correctness': 0.6074,
'answer_similarity': 0.9396
}
Visualize the computed metrics results into a Radar Plot using Plotly. ( Do leave a comment if you feel, other plots can be intrinsic for this… )
import plotly.graph_objects as go
data = {
'context_precision': result['context_precision'],
'faithfulness': result['faithfulness'],
'answer_relevancy': result['answer_relevancy'],
'context_recall': result['context_recall'],
'context_relevancy': result['context_relevancy'],
'answer_correctness': result['answer_correctness'],
'answer_similarity': result['answer_similarity']
}
fig = go.Figure()
fig.add_trace(go.Scatterpolar(
r=list(data.values()),
theta=list(data.keys()),
fill='toself',
name='Ensemble RAG'
))
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 1]
)),
showlegend=True,
title='Retrieval Augmented Generation - Evaluation',
width=800,
)
fig.show()
Hurray! Here is our awesome Radar Plot to visualize the RAG evaluation metric.
(NOTE: You can use this plot to compare multiple RAG variants.)
Thanks for reading!
If you are interested in building your own dataset for RAGAs evaluation from scratch, stay tuned for the next blog!!!!
You can find the complete code at the end of the page. See you Again…