Pluggability of Vector Store
This document outlines the steps for creating a custom vector store class that inherits from the provided BaseVectorStore interface.
As we know, many LLM applications leverage vector stores to efficiently retrieve relevant information for generating responses. A vector store acts as a specialized database designed to store and retrieve high-dimensional vector representations of data, such as documents. These vectors capture the semantic meaning and relationships between concepts in the data.
Understanding the Base Interface:
To create your own vector store, you need to extend the BaseVectorStore class and implement the following methods:
get_client
An abstract method that subclasses must implement to retrieve the client object used to interact with the specific vector store backend (e.g., Pinecone, Faiss).
Required
chunk_list
Helper function that splits a document list into batches of a specified size.
Optional
add_documents
An abstract method for adding documents to the vector store. Subclasses implement their specific logic for document insertion.
Required
similarity_search_with_score
An abstract method for performing similarity search on the vector store. Subclasses implement their specific logic for retrieving similar documents and scores based on a query string.
Required
Implementation
Let's implement a custom vector store YourVectorStoreClass inheriting from BaseVectorStore that adds documents and returns relevant documents whose text contains the semantic meaning in the user query.
Create a new file named <your_vector_store_name>.py in the vectorstores folder to define the YourVectorStoreClass .
Here's a brief overview of how you'll implement the YourVectorStoreClass:
from vectorstores.base import BaseVectorStore
class YourVectorStoreClass(BaseVectorStore):
def get_client(self):
# Implement logic to retrieve the client for your vector store backend
pass
def chunk_list(self, document_list, chunk_size):
# Optional: Implement logic to chunk documents into batches
pass
def add_documents(self, documents, fresh_collection: bool = False):
# Implement logic to generate embedding, add documents to your vector store and return document IDs
pass
def similarity_search_with_score(self, query, collection_name: str, k: int = 20):
#Args:
#query: The query string to search for.
#collection_name: Name of the collection within the vector store to search in.
#k: The maximum number of documents to fetch from the vector store (default: 20).
# Implement logic to perform a similarity search and return results with scores
passIn this example, we implement the YourVectorStoreClass, which provides concrete implementations for the abstract methods defined in BaseVectorStore.
get_client: Retrieves the specific client used to interact with the vector store backend.
chunk_list: Optionally, splits documents into chunks for more manageable processing.
add_documents: Adds documents to the vector store.
similarity_search_with_score: Performs a similarity search and returns documents along with their similarity scores based on the query.
Go to the vectorstores folder and update __init__.py with the module lookup entry for YourVectorStoreClass.
Modify env_manager.py to import YourVectorStoreClass and add a mapping in the self.indexes dictionary.
This setup ensures that YourVectorStoreClass can be instantiated based on specific environment variables, effectively integrating it into the environment management system. The self.indexes the dictionary now includes a mapping where customVector corresponds to the YourVectorStoreClass, and uses VECTOR_STORE_TYPE as the environment key.
Configuration
Configure your environment variables in the .env file for connecting to the vector store.
Example Usage
Here is an example of how to add and query documents using the vectorstore_class.
Adding/Appending Documents
Querying Documents
By following this structure, you can efficiently interact with your custom vector store, adding and querying documents as needed.
Last updated