Large Language Models (LLMs) have become pivotal in reshaping the world by enabling advanced natural language processing tasks such as document analysis, content generation, and conversational assistance. Their ability to process and generate human-like text has unlocked unprecedented opportunities across different domains such as healthcare, education, finance, and more. However, commercial LLM platforms face several limitations, including data privacy concerns, context size restrictions, lack of parameter configurability, and limited evaluation capabilities. These shortcomings hinder their effectiveness, particularly in scenarios involving sensitive information, large-scale document analysis, or the need for customized output. This underscores the need for a tool that combines the power of LLMs with enhanced privacy, flexibility, and usability.
To address these challenges, we present EvidenceBot, a local, Retrieval-Augmented Generation (RAG)-based solution designed to overcome the limitations of commercial LLM platforms. EvidenceBot enables secure and efficient processing of large document sets through its privacy-preserving RAG pipeline, which extracts and appends only the most relevant text chunks as context for queries. The tool allows users to experiment with hyperparameter configurations, optimizing model responses for specific tasks, and includes an evaluation module to assess LLM performance against ground truths using semantic and similarity-based metrics. By offering enhanced privacy, customization, and evaluation capabilities, EvidenceBot bridges critical gaps in the LLM ecosystem, providing a versatile resource for individuals and organizations seeking to leverage LLMs effectively.
The architecture of the proposed pipeline is shown in the system diagram. The pipeline is capable of processing documents in various formats, such as HTML
, txt
, md
, py
, pdf
, csv
, xlsx
, and docx
. To handle these different file types, the pipeline utilizes the LangChain
document loader.
Once the documents are loaded, they are divided into smaller text chunks (default size = 512 tokens), with the option for the user to adjust the chunk size, thus controlling the number of chunks created for the given set of documents. These text chunks are then converted into embeddings using a preselected embedding model.
Following this, the pipeline leverages LangChain
to generate semantic indexes for each embedding, facilitating the retrieval and ranking of relevant information based on context and meaning, rather than relying on simple keyword matching. These embeddings and their corresponding semantic indexes are stored in ChromaDB
, a high-performance vector database optimized for efficient similarity search.
When a user submits a query, the pipeline converts it into an embedding using the same model as previously used for the documents. Then a semantic search is performed within the vector database, returning the top k most relevant results. The user can specify the value of k within the tool. These relevant text snippets are appended to the user's query and passed to the language model to ensure that the model has the complete context required to generate an accurate response. For compiling and running the language models, we used Ollama
, which is an open-source platform that facilitates the deployment and management of local LLM models.
The EvidenceBot dashboard uses Streamlit
, HTML
, and CSS
to create an intuitive, interactive user interface. Streamlit
works as the primary framework, which enables integration with Python
logic and language models while managing user inputs, file uploads, and dynamic outputs. HTML
structures key elements such as the navigation bar, input forms, and chat containers, and CSS
is used to enhance the visual design, setting background colors, adjusting container layouts, and ensuring responsiveness. The combination of these tools ensures a streamlined user experience.
The application is, by default, accessible at the following link on the local machine. In this mode, users can provide a prompt and receive responses generated by a selected Large Language Model (LLM). Users can also adjust several model parameters as described in the Application Parameters section.
The model selector dynamically lists all LLMs installed locally. Upon query submission, the system builds a vector database from files placed in the DATA_DOCUMENTS
folder. Relevant chunks are retrieved and provided as context to the model for response enhancement.
If any parameters are changed, the application should be restarted to regenerate the vector database accordingly.
Responses are logged in the History/log.csv
file along with timestamps, original prompts, and source content.
Selecting the Batch Question Mode enables processing a list of questions via uploaded .csv
file. Each row should contain a single question, processed sequentially by the model.
Progress is tracked during execution, and results are saved in History/log.csv
.
To evaluate an individual response, use the Evaluate
mode at
localhost:8502. Users must input both the reference and generated texts for comparison.
The tool outputs BLEU-4, ROUGE-L, BERTScore, and Cosine Similarity metrics with visualization support.
Users can evaluate a batch of responses by uploading two .csv
files:
reference.csv
(ground truth) and candidate.csv
(model outputs).
The system visualizes metric comparisons across the dataset using bar plots.
The app supports three RAG parameters and eight model-specific generation parameters.
openai/text-embedding-ada-002
).1.0
).0.9
).40
).1.0
).2048
).1.1
).0
).0.1
).5.0
).To install the app, we have to do the following:
https://github.com/Nafiz43/EvidenceBot
conda create -n EvidenceBot python=3.10.0
conda activate EvidenceBot
pip install -r requirements.txt
ollama pull MODEL_NAME
Replace MODEL_NAME
with your desired model name.
List of all the models is available at this
link.
DATA_DOCUMENTS
folder.
ollama pull MODEL_NAME
sh command.sh
The amount of VRAM required depends on the model that we want to run. Here is an estimate:
Note: While lower configurations are viable, performance may be compromised, leading to longer execution times and potential system slowdowns.
If deploying to cloud infrastructure:
Download the Dockerfile
from the given link
Use the following commands to build and run the Docker container securely.
1. Build the Docker image:
docker build -t evidencebot .
2. Run the Docker container and expose port 8501:
docker run -it --rm -p 8501:8501 -v $(pwd)/DATA_DOCUMENTS:/app/DATA_DOCUMENTS evidencebot
This command:
8501
to container's port 8501
DATA_DOCUMENTS
folder so your documents are accessible inside the container--rm
)This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).
The EvidenceBot project is licensed under the Apache License 2.0. This permissive license allows you to use, modify, and distribute the software for both personal and commercial purposes, as long as you include proper attribution and comply with the terms outlined in the license.
Contributions are very welcome! If you'd like to add features, fix bugs, or improve the documentation, please feel free to fork the repository and create a pull request. Make sure your changes are well-documented and follow the project's coding standards.
We appreciate your interest in improving this project—thank you for helping make it better!
For high-level discussions, funding opportunities, or collaboration inquiries, please reach out to the project supervisor, Professor Vladimir Filkov (vfilkov@ucdavis.edu).
For technical questions, bug reports, or concerns regarding the codebase, please contact the project lead, Nafiz Imtiaz Khan (nikhan@ucdavis.edu).
We're excited to hear from you!
@inproceedings{khan2025evidencebot,
author = {Nafiz Imtiaz Khan and Vladimir Filkov},
title = {EvidenceBot: A Privacy-Preserving, Customizable RAG-Based Tool for Enhancing Large Language Model Interactions},
booktitle = {Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion '25)},
year = {2025},
doi = {10.1145/3696630.3728607},
isbn = {979-8-4007-1276-0/2025/06},
location = {Trondheim, Norway},
publisher = {ACM},
address = {New York, NY, USA}
}