DataStax Astra DB is a serverless
AI-ready database built on Apache Cassandra®
and made conveniently available
through an easy-to-use JSON API.
Setup
Dependencies
Use of the integration requires thelangchain-astradb
partner package:
Credentials
In order to use the AstraDB vector store, you must first head to the AstraDB website, create an account, and then create a new database - the initialization might take a few minutes. Once the database has been initialized, retrieve your connection secrets, which you’ll need momentarily. These are:- an
API Endpoint
, such as"https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com/"
- and a
Database Token
, e.g."AstraCS:aBcD123......"
keyspace
(called “namespace” in the LangChain components), which you can manage from the Data Explorer
tab of your database dashboard. If you wish, you can leave it empty in the prompt below and fall back to a default keyspace.
Initialization
There are various ways to create an Astra DB vector store:Method 1: Explicit embeddings
You can separately instantiate alangchain_core.embeddings.Embeddings
class and pass it to the AstraDBVectorStore
constructor, just like with most other LangChain vector stores.
Method 2: Server-side embeddings (‘vectorize’)
Alternatively, you can use the server-side embedding computation feature of Astra DB (‘vectorize’) and simply specify an embedding model when creating the server infrastructure for the store. The embedding computations will then be entirely handled within the database in subsequent read and write operations. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described in the docs.)Method 3: Auto-detect from a pre-existing collection
You may already have a collection in your Astra DB, possibly pre-populated with data through other means (e.g. via the Astra UI or a third-party application), and just want to start querying it within LangChain. In this case, the right approach is to enable theautodetect_collection
mode in the vector store constructor and let the class figure out the details. (Of course, if your collection has no ‘vectorize’, you still need to provide an Embeddings
object).
A note on “hybrid search”
Astra DB vector stores support metadata search in vector searches; furthermore, version 0.6 introduced full support for hybrid search through the findAndRerank database primitive: documents are retrieved from both a vector-similarity and a keyword-based (“lexical”) search, and are then merged through a reranker model. This search strategy, entirely handled on server-side, can boost the accuracy of your results, thus improving the quality of your RAG application. Whenever available, hybrid search is used automatically by the vector store (though you can exert manual control over it if you wish to do so).Additional information
TheAstraDBVectorStore
can be configured in many ways; see the API Reference for a full guide covering e.g. asynchronous initialization; non-Astra-DB databases; custom indexing allow-/deny-lists; manual hybrid-search control; and much more.
Explicit embedding initialization (method 1)
Instantiate our vector store using an explicit embedding class:Server-side embedding initialization (“vectorize”, method 2)
In this example code, it is assumed that you have- Enabled the OpenAI integration in your Astra DB organization,
- Added an API Key named
"OPENAI_API_KEY"
to the integration, and scoped it to the database you are using.
Auto-detect initialization (method 3)
You can use this pattern if the collection already exists on the database and yourAstraDBVectorStore
needs to use it (for reads and writes). The LangChain component will inspect the collection and figure out the details.
This is the recommended approach if the collection has been created and — most importantly — populated by tools other than LangChain, for example if the data has been ingested through the Astra DB Web interface.
Auto-detect mode cannot coexist with collection settings (such as the similarity metric and such); on the other hand, if no server-side embeddings are employed, one still needs to pass an Embeddings
object to the constructor.
In the following example code, we will “auto-detect” the very same collection that was created by method 2 above (“vectorize”). Hence, no Embeddings
object needs to be supplied.
Manage vector store
Once you have created your vector store, interact with it by adding and deleting different items. All interactions with the vector store proceed regardless of the initialization method: please adapt the following cell, if you desire, to select a vector store you have created and want to put to test.Add items to vector store
Add documents to the vector store by using theadd_documents
method.
The “id” field can be supplied separately, in a matching ids=[...]
parameter to add_documents
, or even left out entirely to let the store generate IDs.
Delete items from vector store
Delete items by ID by using thedelete
function.
Query the vector store
Once the vector store is created and populated, you can query it (e.g. as part of your chain or agent).Query directly
Similarity search
Search for documents similar to a provided text, with additional metadata filters if desired:Similarity search with score
You can return the similarity score as well:Specify a different keyword query (requires hybrid search)
Note: this cell can be run only if the collection supports the find-and-rerank command and if the vector store is aware of this fact.If the vector store is using a hybrid-enabled collection and has detected this fact, by default it will use that capability when running searches. In that case, the same query text is used for both the vector-similarity and the lexical-based retrieval steps in the find-and-rerank process, unless you explicitly provide a different query for the latter:
Other search methods
There are a variety of other search methods that are not covered in this notebook, such as MMR search and search by vector. For a full list of the search modes available inAstraDBVectorStore
check out the API reference.
Query by turning into retriever
You can also make the vector store into a retriever, for easier usage in your chains. Transform the vector store into a retriever and invoke it with a simple query + metadata filter:Usage for retrieval-augmented generation
For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections: For more, check out a complete RAG template using Astra DB here.Cleanup vector store
If you want to completely delete the collection from your Astra DB instance, run this. (You will lose the data you stored in it.)API reference
For detailed documentation of allAstraDBVectorStore
features and configurations, consult the API reference.