# simple_rag_ch **Repository Path**: objdump/simple_rag_ch ## Basic Information - **Project Name**: simple_rag_ch - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-23 - **Last Updated**: 2025-10-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # simple_rag_ch A lightweight Retrieval-Augmented Generation (RAG) example adapted from simple_rag with SiliconFlow API integration and safer credential handling. Original project: https://github.com/AightBits/simple_rag Original README: ./README_original.md ## What Changed - SiliconFlow APIs used for both embeddings and chat completions - Embedding model: `BAAI/bge-m3` - Chat model: `deepseek-ai/DeepSeek-V3.1-Terminus` - Environment variable renamed to `SILICONFLOW_API_KEY` - Added `.env.sh` for easy environment setup and updated `.gitignore` to exclude it - Kept the original scripts and flow, minimizing code changes - Clarification: "objdump" in this repo context refers to the Git user ID, not the GNU objdump tool ## Project Structure ``` simple_rag_ch/ ├── simple_rag_ch/ │ ├── config.py # Reads SILICONFLOW_API_KEY and other settings │ ├── embedding_api.py # Calls SiliconFlow Embeddings API │ ├── ingest.py # Full-document ingestion into ChromaDB │ ├── ingest_chunk.py # Chunked ingestion (optional) │ ├── infer.py # Interactive retrieval + chat completion │ ├── projects/ # Place your .txt files under a project folder │ └── chroma_db/ # Vector database (auto-created) ├── .env.sh # Export SILICONFLOW_API_KEY (gitignored) ├── ENV_SETUP.md # How to load and manage environment variables ├── README_original.md # Original upstream README kept for reference └── README.md # This file ``` ## Setup 1. Create and load your environment variables - Edit `.env.sh` and set your key, then: ```bash source .env.sh ``` - Verify: ```bash echo $SILICONFLOW_API_KEY ``` - See `ENV_SETUP.md` for details 2. Install dependencies and prepare folders ```bash bash simple_rag_ch/setup.sh ``` ## Usage 1. Prepare data - Create a project folder and add `.txt` files: ``` simple_rag_ch/projects//*.txt ``` 2. Ingest data (choose one) - Full document ingest: ```bash python simple_rag_ch/ingest.py ``` - Chunked ingest (for large docs): ```bash python simple_rag_ch/ingest_chunk.py ``` 3. Query ```bash python simple_rag_ch/infer.py ``` ## Configuration Edit `simple_rag_ch/config.py` as needed: - `DB_PATH`: ChromaDB path - `OPENAI_API_URL`: SiliconFlow chat completions endpoint - `OPENAI_MODEL`: Chat model name - `RELEVANCE_THRESHOLD`: Similarity threshold for retrieval - `TOP_N_RESULTS`: Max retrieved docs - `PROJECTS_DIR`: Data directory ## Security Notes - Do not commit API keys. `.env.sh` is gitignored - Use `SILICONFLOW_API_KEY` from the environment (no hardcoded keys) - Rotate keys periodically ## Credits and License - Based on the original simple_rag project (Apache 2.0) - This repository remains under Apache License 2.0