Pluggability of Vector Store

This document outlines the steps for creating a custom vector store class that inherits from the provided BaseVectorStore interface.

As we know, many LLM applications leverage vector stores to efficiently retrieve relevant information for generating responses. A vector store acts as a specialized database designed to store and retrieve high-dimensional vector representations of data, such as documents. These vectors capture the semantic meaning and relationships between concepts in the data.

Understanding the Base Interface:

To create your own vector store, you need to extend the BaseVectorStore class and implement the following methods:

Method/Property
Description
Required/Optional

get_client

An abstract method that subclasses must implement to retrieve the client object used to interact with the specific vector store backend (e.g., Pinecone, Faiss).

Required

chunk_list

Helper function that splits a document list into batches of a specified size.

Optional

add_documents

An abstract method for adding documents to the vector store. Subclasses implement their specific logic for document insertion.

Required

similarity_search_with_score

An abstract method for performing similarity search on the vector store. Subclasses implement their specific logic for retrieving similar documents and scores based on a query string.

Required

Implementation

Let's implement a custom vector store YourVectorStoreClass inheriting from BaseVectorStore that adds documents and returns relevant documents whose text contains the semantic meaning in the user query.

Create a new file named <your_vector_store_name>.py in the vectorstores folder to define the YourVectorStoreClass .

Here's a brief overview of how you'll implement the YourVectorStoreClass:

from vectorstores.base import BaseVectorStore

class YourVectorStoreClass(BaseVectorStore):
    
    def get_client(self):
        # Implement logic to retrieve the client for your vector store backend
        pass

    def chunk_list(self, document_list, chunk_size):
        # Optional: Implement logic to chunk documents into batches
        pass

    def add_documents(self, documents, fresh_collection: bool = False):
        # Implement logic to generate embedding, add documents to your vector store and return document IDs
        pass

    def similarity_search_with_score(self, query, collection_name: str, k: int = 20):
        #Args:
            #query: The query string to search for.
            #collection_name: Name of the collection within the vector store to search in.
            #k: The maximum number of documents to fetch from the vector store (default: 20).
            
        # Implement logic to perform a similarity search and return results with scores
        pass

In this example, we implement the YourVectorStoreClass, which provides concrete implementations for the abstract methods defined in BaseVectorStore.

  • get_client: Retrieves the specific client used to interact with the vector store backend.

  • chunk_list: Optionally, splits documents into chunks for more manageable processing.

  • add_documents: Adds documents to the vector store.

  • similarity_search_with_score: Performs a similarity search and returns documents along with their similarity scores based on the query.

Go to the vectorstores folder and update __init__.py with the module lookup entry for YourVectorStoreClass.

Modify env_manager.py to import YourVectorStoreClass and add a mapping in the self.indexes dictionary.

This setup ensures that YourVectorStoreClass can be instantiated based on specific environment variables, effectively integrating it into the environment management system. The self.indexes the dictionary now includes a mapping where customVector corresponds to the YourVectorStoreClass, and uses VECTOR_STORE_TYPE as the environment key.

Configuration

Configure your environment variables in the .env file for connecting to the vector store.

Example Usage

Here is an example of how to add and query documents using the vectorstore_class.

Adding/Appending Documents

Querying Documents

By following this structure, you can efficiently interact with your custom vector store, adding and querying documents as needed.

Last updated