Activeloop Deep Memory is a suite of tools that enables you to optimize your Vector Store for your use-case and achieve higher accuracy in your LLM apps.
Retrieval-Augmented Generatation
(RAG
) has recently gained significant attention. As advanced RAG techniques and agents emerge, they expand the potential of what RAGs can accomplish. However, several challenges may limit the integration of RAGs into production. The primary factors to consider when implementing RAGs in production settings are accuracy (recall), cost, and latency. For basic use cases, OpenAI’s Ada model paired with a naive similarity search can produce satisfactory results. Yet, for higher accuracy or recall during searches, one might need to employ advanced retrieval techniques. These methods might involve varying data chunk sizes, rewriting queries multiple times, and more, potentially increasing latency and costs. Activeloop’s Deep Memory a feature available to Activeloop Deep Lake
users, addresses these issuea by introducing a tiny neural network layer trained to match user queries with relevant data from a corpus. While this addition incurs minimal latency during search, it can boost retrieval accuracy by up to 27
% and remains cost-effective and simple to use, without requiring any additional advanced rag techniques.
For this tutorial we will parse DeepLake
documentation, and create a RAG system that could answer the question from the docs.
1. Dataset Creation
We will parse activeloop’s docs for this tutorial usingBeautifulSoup
library and LangChain’s document parsers like Html2TextTransformer
, AsyncHtmlLoader
. So we will need to install the following libraries:
BeautifulSoup
Converting data into user readable format:
2. Generating synthetic queries and training Deep Memory
Next step would be to train a deep_memory model that will align your users queries with the dataset that you already have. If you don’t have any user queries yet, no worries, we will generate them using LLM!TODO: Add image
Here above we showed the overall schema how deep_memory works. So as you can see, in order to train it you need relevance, queries together with corpus data (data that we want to query). Corpus data was already populated in the previous section, here we will be generating questions and relevance.questions
- is a text of strings, where each string represents a queryrelevance
- contains links to the ground truth for each question. There might be several docs that contain answer to the given question. Because of this relevenve isList[List[tuple[str, float]]]
, where outer list represents queries and inner list relevant documents. Tuple contains str, float pair where string represent the id of the source doc (corresponds to theid
tensor in the dataset), while float corresponds to how much current document is related to the question.
3. Evaluating Deep Memory performance
Great we’ve trained the model! It’s showing some substantial improvement in recall, but how can we use it now and evaluate on unseen new data? In this section we will delve into model evaluation and inference part and see how it can be used with LangChain in order to increase retrieval accuracy3.1 Deep Memory evaluation
For the beginning we can use deep_memory’s builtin evaluation method. It calculates severalrecall
metrics.
It can be done easily in a few lines of code.