Pluggability of Vector Store

This document outlines the steps for creating a custom vector store class that inherits from the provided BaseVectorStore interface.

As we know, many LLM applications leverage vector stores to efficiently retrieve relevant information for generating responses. A vector store acts as a specialized database designed to store and retrieve high-dimensional vector representations of data, such as documents. These vectors capture the semantic meaning and relationships between concepts in the data.

Understanding the Base Interface:

To create your own vector store, you need to extend the BaseVectorStore class and implement the following methods:

Method/Property DescriptionRequired/Optional

get_client

An abstract method that subclasses must implement to retrieve the client object used to interact with the specific vector store backend (e.g., Pinecone, Faiss).

Required

chunk_list

Helper function that splits a document list into batches of a specified size.

Optional

add_documents

An abstract method for adding documents to the vector store. Subclasses implement their specific logic for document insertion.

Required

similarity_search_with_score

An abstract method for performing similarity search on the vector store. Subclasses implement their specific logic for retrieving similar documents and scores based on a query string.

Required

Implementation

Let's implement a custom vector store YourVectorStoreClass inheriting from BaseVectorStore that adds documents and returns relevant documents whose text contains the semantic meaning in the user query.

Create a new file named <your_vector_store_name>.py in the vectorstores folder to define the YourVectorStoreClass .

Here's a brief overview of how you'll implement the YourVectorStoreClass:

from vectorstores.base import BaseVectorStore

class YourVectorStoreClass(BaseVectorStore):
    
    def get_client(self):
        # Implement logic to retrieve the client for your vector store backend
        pass

    def chunk_list(self, document_list, chunk_size):
        # Optional: Implement logic to chunk documents into batches
        pass

    def add_documents(self, documents, fresh_collection: bool = False):
        # Implement logic to generate embedding, add documents to your vector store and return document IDs
        pass

    def similarity_search_with_score(self, query, collection_name: str, k: int = 20):
        #Args:
            #query: The query string to search for.
            #collection_name: Name of the collection within the vector store to search in.
            #k: The maximum number of documents to fetch from the vector store (default: 20).
            
        # Implement logic to perform a similarity search and return results with scores
        pass

In this example, we implement the YourVectorStoreClass, which provides concrete implementations for the abstract methods defined in BaseVectorStore.

  • get_client: Retrieves the specific client used to interact with the vector store backend.

  • chunk_list: Optionally, splits documents into chunks for more manageable processing.

  • add_documents: Adds documents to the vector store.

  • similarity_search_with_score: Performs a similarity search and returns documents along with their similarity scores based on the query.

Go to the vectorstores folder and update __init__.py with the module lookup entry for YourVectorStoreClass.

__init__.py
_module_lookup = {
    ...,
    "YourVectorStoreClass": "llm.<your_vector_store_name>"
}

Modify env_manager.py to import YourVectorStoreClass and add a mapping in the self.indexes dictionary.

env_manager.py
from vectorstores import (
    ...,
    YourVectorStoreClass
)
env_manager.py
self.indexes = {
    "vectorstore": {
        "class": {
            ...,
            "<your_vector_store_name>": YourVectorStoreClass
        },
        "env_key": "VECTOR_STORE_TYPE"
    }
}

This setup ensures that YourVectorStoreClass can be instantiated based on specific environment variables, effectively integrating it into the environment management system. The self.indexes the dictionary now includes a mapping where customVector corresponds to the YourVectorStoreClass, and uses VECTOR_STORE_TYPE as the environment key.

Configuration

Configure your environment variables in the .env file for connecting to the vector store.

.env
VECTOR_STORE_TYPE=<your_vector_store_name>
VECTOR_STORE_ENDPOINT=<vector_store_endpoint>
EMBEDDING_MODEL=<embedding_model_name>
VECTOR_COLLECTION_NAME=<collection_name>

Example Usage

Here is an example of how to add and query documents using the vectorstore_class.

Adding/Appending Documents

from env_manager import vectorstore_class
from langchain.docstore.document import Document

fresh_collection = True
documents = [
    Document(page_content="Test one", metadata={}), 
    Document(page_content="Test two", metadata={}),
    Document(page_content="Test three", metadata={}),
]

documentIDs = vectorstore_class.add_documents(documents, fresh_collection)
print(documentIDs)  # Output: [1, 2, 3]

Querying Documents

query = "Test one"
collection_name = "test"
documents = vectorstore_class.similarity_search_with_score(query, collection_name, k=20)
print(documents)

# Expected output:
# [
#    (Document(page_content="Test one", metadata={}), 0.95),
#    (Document(page_content="Test two", metadata={}), 0.65),
#    (Document(page_content="Test three", metadata={}), 0.61)
#  ]

By following this structure, you can efficiently interact with your custom vector store, adding and querying documents as needed.

Last updated