# simple_rag_ch

**Repository Path**: objdump/simple_rag_ch

## Basic Information

- **Project Name**: simple_rag_ch
- **Description**: No description available
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-23
- **Last Updated**: 2025-10-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# simple_rag_ch

A lightweight Retrieval-Augmented Generation (RAG) example adapted from simple_rag with SiliconFlow API integration and safer credential handling.
Original project: https://github.com/AightBits/simple_rag
Original README: ./README_original.md

## What Changed

- SiliconFlow APIs used for both embeddings and chat completions
  - Embedding model: `BAAI/bge-m3`
  - Chat model: `deepseek-ai/DeepSeek-V3.1-Terminus`
- Environment variable renamed to `SILICONFLOW_API_KEY`
- Added `.env.sh` for easy environment setup and updated `.gitignore` to exclude it
- Kept the original scripts and flow, minimizing code changes
- Clarification: "objdump" in this repo context refers to the Git user ID, not the GNU objdump tool

## Project Structure

```
simple_rag_ch/
├── simple_rag_ch/
│   ├── config.py           # Reads SILICONFLOW_API_KEY and other settings
│   ├── embedding_api.py    # Calls SiliconFlow Embeddings API
│   ├── ingest.py           # Full-document ingestion into ChromaDB
│   ├── ingest_chunk.py     # Chunked ingestion (optional)
│   ├── infer.py            # Interactive retrieval + chat completion
│   ├── projects/           # Place your .txt files under a project folder
│   └── chroma_db/          # Vector database (auto-created)
├── .env.sh                 # Export SILICONFLOW_API_KEY (gitignored)
├── ENV_SETUP.md            # How to load and manage environment variables
├── README_original.md      # Original upstream README kept for reference
└── README.md               # This file
```

## Setup

1. Create and load your environment variables
   - Edit `.env.sh` and set your key, then:
     ```bash
     source .env.sh
     ```
   - Verify:
     ```bash
     echo $SILICONFLOW_API_KEY
     ```
   - See `ENV_SETUP.md` for details

2. Install dependencies and prepare folders
   ```bash
   bash simple_rag_ch/setup.sh
   ```

## Usage

1. Prepare data
   - Create a project folder and add `.txt` files:
     ```
     simple_rag_ch/projects/<project_name>/*.txt
     ```

2. Ingest data (choose one)
   - Full document ingest:
     ```bash
     python simple_rag_ch/ingest.py <project_name>
     ```
   - Chunked ingest (for large docs):
     ```bash
     python simple_rag_ch/ingest_chunk.py <project_name>
     ```

3. Query
   ```bash
   python simple_rag_ch/infer.py <project_name>
   ```

## Configuration

Edit `simple_rag_ch/config.py` as needed:
- `DB_PATH`: ChromaDB path
- `OPENAI_API_URL`: SiliconFlow chat completions endpoint
- `OPENAI_MODEL`: Chat model name
- `RELEVANCE_THRESHOLD`: Similarity threshold for retrieval
- `TOP_N_RESULTS`: Max retrieved docs
- `PROJECTS_DIR`: Data directory

## Security Notes

- Do not commit API keys. `.env.sh` is gitignored
- Use `SILICONFLOW_API_KEY` from the environment (no hardcoded keys)
- Rotate keys periodically

## Credits and License

- Based on the original simple_rag project (Apache 2.0)
- This repository remains under Apache License 2.0