Pluggability of Vector Store
This document outlines the steps for creating a custom vector store class that inherits from the provided BaseVectorStore
interface.
As we know, many LLM applications leverage vector stores to efficiently retrieve relevant information for generating responses. A vector store acts as a specialized database designed to store and retrieve high-dimensional vector representations of data, such as documents. These vectors capture the semantic meaning and relationships between concepts in the data.
Understanding the Base Interface:
To create your own vector store, you need to extend the BaseVectorStore
class and implement the following methods:
get_client
An abstract method that subclasses must implement to retrieve the client object used to interact with the specific vector store backend (e.g., Pinecone, Faiss).
Required
chunk_list
Helper function that splits a document list into batches of a specified size.
Optional
add_documents
An abstract method for adding documents to the vector store. Subclasses implement their specific logic for document insertion.
Required
similarity_search_with_score
An abstract method for performing similarity search on the vector store. Subclasses implement their specific logic for retrieving similar documents and scores based on a query string.
Required
Implementation
Let's implement a custom vector store YourVectorStoreClass
inheriting from BaseVectorStore
that adds documents and returns relevant documents whose text contains the semantic meaning in the user query.
Create a new file named <your_vector_store_name>.py
in the vectorstores
folder to define the YourVectorStoreClass
.
Here's a brief overview of how you'll implement the YourVectorStoreClass
:
from vectorstores.base import BaseVectorStore
class YourVectorStoreClass(BaseVectorStore):
def get_client(self):
# Implement logic to retrieve the client for your vector store backend
pass
def chunk_list(self, document_list, chunk_size):
# Optional: Implement logic to chunk documents into batches
pass
def add_documents(self, documents, fresh_collection: bool = False):
# Implement logic to generate embedding, add documents to your vector store and return document IDs
pass
def similarity_search_with_score(self, query, collection_name: str, k: int = 20):
#Args:
#query: The query string to search for.
#collection_name: Name of the collection within the vector store to search in.
#k: The maximum number of documents to fetch from the vector store (default: 20).
# Implement logic to perform a similarity search and return results with scores
pass
In this example, we implement the YourVectorStoreClass
, which provides concrete implementations for the abstract methods defined in BaseVectorStore
.
get_client: Retrieves the specific client used to interact with the vector store backend.
chunk_list: Optionally, splits documents into chunks for more manageable processing.
add_documents: Adds documents to the vector store.
similarity_search_with_score: Performs a similarity search and returns documents along with their similarity scores based on the query.
Go to the vectorstores
folder and update __init__.py
with the module lookup entry for YourVectorStoreClass
.
_module_lookup = {
...,
"YourVectorStoreClass": "llm.<your_vector_store_name>"
}
Modify env_manager.py
to import YourVectorStoreClass
and add a mapping in the self.indexes
dictionary.
from vectorstores import (
...,
YourVectorStoreClass
)
self.indexes = {
"vectorstore": {
"class": {
...,
"<your_vector_store_name>": YourVectorStoreClass
},
"env_key": "VECTOR_STORE_TYPE"
}
}
This setup ensures that YourVectorStoreClass
can be instantiated based on specific environment variables, effectively integrating it into the environment management system. The self.indexes
the dictionary now includes a mapping where customVector
corresponds to the YourVectorStoreClass
, and uses VECTOR_STORE_TYPE
as the environment key.
Configuration
Configure your environment variables in the .env
file for connecting to the vector store.
VECTOR_STORE_TYPE=<your_vector_store_name>
VECTOR_STORE_ENDPOINT=<vector_store_endpoint>
EMBEDDING_MODEL=<embedding_model_name>
VECTOR_COLLECTION_NAME=<collection_name>
Example Usage
Here is an example of how to add and query documents using the vectorstore_class
.
Adding/Appending Documents
from env_manager import vectorstore_class
from langchain.docstore.document import Document
fresh_collection = True
documents = [
Document(page_content="Test one", metadata={}),
Document(page_content="Test two", metadata={}),
Document(page_content="Test three", metadata={}),
]
documentIDs = vectorstore_class.add_documents(documents, fresh_collection)
print(documentIDs) # Output: [1, 2, 3]
Querying Documents
query = "Test one"
collection_name = "test"
documents = vectorstore_class.similarity_search_with_score(query, collection_name, k=20)
print(documents)
# Expected output:
# [
# (Document(page_content="Test one", metadata={}), 0.95),
# (Document(page_content="Test two", metadata={}), 0.65),
# (Document(page_content="Test three", metadata={}), 0.61)
# ]
By following this structure, you can efficiently interact with your custom vector store, adding and querying documents as needed.
Last updated