Data Ingestion Process
After completing the installation, follow these steps to index all contents related to a specific use case:
Release 3.0.0
Install Python on the machine where the files need to be ingested.
Clone Git Repo from https://github.com/Sunbird-AIAssistant/sakhi-api-service.
Go to the root directory and update the
.env
file with the necessary vector store configuration values.Run the following:
Before Release 3.0.0
Install Python on the machine where the files need to be ingested.
Place the files to be indexed in a folder on the machine.
Download index_documents.py and requirements-dev.txt file from https://github.com/Sunbird-AIAssistant/sakhi-api-service
Run the following:
Notes:
Please run the commands via screen background, as it will take a couple of hours to run
“--fresh_index” is to be used when you run the indexing for the first time or delete the existing index and freshly index it. If you want to append new files to the existing index, run it without --fresh_index
For running without --fresh_index, ensure your new files are kept in a new folder and the --folder_path is pointed to only the new files.
Last updated