# Data Ingestion Process

After completing the installation, follow these steps to index all contents related to a specific use case:

### Release 3.0.0

1. Install Python on the machine where the files need to be ingested.
2. Clone Git Repo from [https://github.com/Sunbird-AIAssistant/sakhi-api-service](https://www.google.com/url?q=https://github.com/Sunbird-AIAssistant/sakhi-api-service\&sa=D\&source=editors\&ust=1712142234872409\&usg=AOvVaw23d0wT4lcVdCV9qqQUVOq8).
3. Go to the root directory and update the `.env` file with the necessary [vector store configuration](/components/sakhi-api-service/environment-variables.md) values.&#x20;
4. Run the following:&#x20;

{% code overflow="wrap" %}

```python
Step 1: pip install -r requirements-dev.txt 
Step 2: python3 index_documents.py --folder_path=<PATH_TO_INPUT_FILE_DIRECTORY> --fresh_index --chunk_size=1024 --chunk_overlap=100

# --fresh_index: Create a new index from scratch.
# --chunk_size: Divide the documents into chunks of 1024 characters. Default: 1024
# --chunk_overlap: Overlap each chunk by 100 characters for context. Default: 100
```

{% endcode %}

### Before Release 3.0.0

1. Install Python on the machine where the files need to be ingested.
2. Place the files to be indexed in a folder on the machine.
3. Download index\_documents.py and requirements-dev.txt file from [https://github.com/Sunbird-AIAssistant/sakhi-api-service](https://www.google.com/url?q=https://github.com/Sunbird-AIAssistant/sakhi-api-service\&sa=D\&source=editors\&ust=1712142234872409\&usg=AOvVaw23d0wT4lcVdCV9qqQUVOq8)
4. Run the following:&#x20;

<pre class="language-python" data-overflow="wrap"><code class="lang-python"><strong>Step 1: pip install -r requirements-dev.txt 
</strong><strong>Step 2: python3 index_documents.py --marqo_url=&#x3C;MARQO_URL> --index_name=&#x3C;MARQO_INDEX_NAME> --folder_path=&#x3C;PATH_TO_INPUT_FILE_DIRECTORY> --fresh_index
</strong></code></pre>

**Notes**:

1. Please run the commands via screen background, as it will take a couple of hours to run
2. “--fresh\_index” is to be used when you run the indexing for the first time or delete the existing index and freshly index it. If you want to append new files to the existing index, run it without --fresh\_index
3. For running without --fresh\_index, ensure your new files are kept in a new folder and the --folder\_path is pointed to only the new files.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-assistant.sunbird.org/get-started-with-ai-assistant/data-ingestion-process.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
