# enterprise-rag-system **Repository Path**: hacker__007/enterprise-rag-system ## Basic Information - **Project Name**: enterprise-rag-system - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-20 - **Last Updated**: 2026-04-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 🎯 Enterprise RAG System
![Python](https://img.shields.io/badge/python-3.10+-blue.svg) ![License](https://img.shields.io/badge/license-MIT-green.svg) ![Status](https://img.shields.io/badge/status-active-success.svg) ![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg) **Production-grade Retrieval-Augmented Generation pipeline for enterprise knowledge bases** [Features](#-features) β€’ [Demo](#-demo) β€’ [Quick Start](#-quick-start) β€’ [Architecture](#-architecture) β€’ [Documentation](#-documentation) β€’ [Contributing](#-contributing)
--- ## 🎯 Problem Statement Modern enterprises face critical challenges in knowledge management: - πŸ“š Information scattered across multiple document formats (PDF, Markdown, Confluence, Notion) - πŸ” Traditional keyword search fails to capture semantic meaning - πŸ€– Generic LLMs lack domain-specific knowledge and hallucinate - ⚑ Latency and accuracy requirements for production deployments - πŸ’° Cost optimization for large-scale document processing **This RAG system solves these problems with a production-ready, scalable architecture.** --- ## ✨ Features ### πŸ”₯ Core Capabilities - **πŸ“„ Multi-Format Document Support** - PDF, Markdown, Docx, HTML, Confluence, Notion - Intelligent chunking with semantic awareness - Metadata extraction and preservation - **πŸ” Hybrid Search Engine** - Semantic search using state-of-the-art embeddings - BM25 keyword search for exact matches - Reciprocal Rank Fusion (RRF) for optimal results - **🧠 Advanced RAG Techniques** - Query expansion and decomposition - Context compression with LLMChain - Re-ranking with Cross-Encoder models - Multi-query retrieval for comprehensive answers - **⚑ Performance Optimized** - Vector database caching and indexing - Async processing for high throughput - Query result caching with Redis - <3s response time for 95th percentile queries - **πŸ“Š Observability & Monitoring** - LangSmith integration for debugging - Arize Phoenix for production monitoring - Answer relevancy scoring (RAGAS metrics) - Cost tracking per query - **πŸ”’ Enterprise-Ready** - Authentication and authorization - Multi-tenancy support - Audit logging - PII detection and redaction --- ## πŸŽ₯ Demo ### Web Interface (Streamlit) ![Demo GIF](docs/images/demo.gif) ### API Usage ```bash curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "query": "What is our company policy on remote work?", "collection": "hr-policies", "top_k": 5 }' ``` ### Response Example ```json { "answer": "According to our Employee Handbook (section 3.2), remote work is...", "sources": [ { "document": "employee-handbook-2024.pdf", "page": 12, "relevance_score": 0.89, "text": "Remote work policy excerpt..." } ], "confidence": 0.87, "latency_ms": 2341, "tokens_used": 1245 } ``` --- ## πŸ—οΈ Architecture ### System Overview ```mermaid graph TB A[User Query] --> B[Query Processor] B --> C{Query Type} C -->|Simple| D[Hybrid Search] C -->|Complex| E[Multi-Query Retrieval] D --> F[Vector DB: Pinecone] E --> F D --> G[BM25 Search] E --> G F --> H[RRF Fusion] G --> H H --> I[Re-Ranker] I --> J[Context Compressor] J --> K[LLM: GPT-4/Claude] K --> L[Answer + Citations] L --> M[Response Cache] M --> N[User] style A fill:#e1f5ff style N fill:#e1f5ff style K fill:#ffe1e1 style F fill:#fff4e1 ``` ### Component Stack | Layer | Technology | Purpose | |-------|-----------|---------| | **Ingestion** | Unstructured.io, PyPDF2, Pandoc | Document parsing | | **Chunking** | LangChain RecursiveCharacterTextSplitter | Semantic segmentation | | **Embedding** | OpenAI Ada-002, Cohere Embed v3 | Vector representation | | **Vector Store** | Pinecone, Weaviate, FAISS | Similarity search | | **Search** | BM25, Dense retrieval, Hybrid | Query processing | | **LLM** | GPT-4, Claude 3, Gemini Pro | Answer generation | | **Orchestration** | LangChain, LangGraph | Pipeline management | | **API** | FastAPI, Pydantic | RESTful interface | | **UI** | Streamlit | Interactive demo | | **Monitoring** | LangSmith, Arize Phoenix | Observability | --- ## πŸš€ Quick Start ### Prerequisites - Python 3.10+ - Docker & Docker Compose (optional, recommended) - OpenAI/Anthropic API key - Pinecone account (free tier available) ### Installation #### Option 1: Docker (Recommended) ```bash # Clone repository git clone https://github.com/jinno-ai/enterprise-rag-system.git cd enterprise-rag-system # Configure environment cp .env.example .env # Edit .env with your API keys # Start services docker-compose up -d # Access the app # API: http://localhost:8000 # UI: http://localhost:8501 ``` #### Option 2: Local Setup ```bash # Clone repository git clone https://github.com/jinno-ai/enterprise-rag-system.git cd enterprise-rag-system # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env with your API keys # Initialize database python scripts/init_vectordb.py # Start API server uvicorn app.main:app --reload --port 8000 # In another terminal, start UI streamlit run ui/app.py ``` ### Ingest Your Documents ```bash # Ingest local documents python scripts/ingest.py --source ./data/documents --collection my-docs # Ingest from Notion python scripts/ingest.py --source notion --notion-token YOUR_TOKEN --collection notion-kb # Ingest from Confluence python scripts/ingest.py --source confluence --space-key MYSPACE --collection confluence-docs ``` --- ## πŸ“Š Performance Benchmarks Tested on 10,000 enterprise documents (50M tokens): | Metric | Value | Notes | |--------|-------|-------| | **Answer Relevancy** | 85.3% | RAGAS score on test set | | **Faithfulness** | 91.2% | No hallucination rate | | **Latency (p50)** | 1.8s | Median response time | | **Latency (p95)** | 2.9s | 95th percentile | | **Throughput** | 150 QPS | With caching enabled | | **Cost per Query** | $0.03 | Using GPT-4 Turbo | | **Accuracy vs Baseline** | +40% | Compared to naive RAG | ### Comparison with Other Solutions | Feature | This System | LlamaIndex | Haystack | |---------|------------|------------|----------| | Hybrid Search | βœ… | ❌ | βœ… | | Query Decomposition | βœ… | ⚠️ | ❌ | | Multi-Tenancy | βœ… | ❌ | ⚠️ | | Production Ready | βœ… | ⚠️ | βœ… | | Observability | βœ… | ⚠️ | βœ… | --- ## πŸ› οΈ Configuration ### Environment Variables ```bash # LLM Configuration OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... # Vector Database PINECONE_API_KEY=... PINECONE_ENVIRONMENT=us-west1-gcp # Embedding Model EMBEDDING_MODEL=text-embedding-ada-002 EMBEDDING_DIMENSION=1536 # Search Configuration HYBRID_SEARCH_ALPHA=0.5 # 0=keyword only, 1=semantic only TOP_K_RESULTS=5 RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-12-v2 # Performance ENABLE_CACHING=true CACHE_TTL_SECONDS=3600 MAX_WORKERS=4 # Monitoring LANGSMITH_API_KEY=... LANGSMITH_PROJECT=enterprise-rag ARIZE_API_KEY=... ``` --- ## πŸ“– Documentation - [πŸ“š Full Documentation](docs/README.md) - [πŸ—οΈ Architecture Deep Dive](docs/architecture.md) - [πŸ”§ Configuration Guide](docs/configuration.md) - [πŸš€ Deployment Guide](docs/deployment.md) - [πŸ§ͺ Evaluation Methodology](docs/evaluation.md) - [🀝 API Reference](docs/api.md) --- ## πŸ§ͺ Testing ```bash # Run unit tests pytest tests/unit # Run integration tests pytest tests/integration # Run end-to-end tests pytest tests/e2e # Generate coverage report pytest --cov=app tests/ ``` --- ## πŸ—ΊοΈ Roadmap ### βœ… Completed - [x] Core RAG pipeline with hybrid search - [x] Multi-format document ingestion - [x] FastAPI REST API - [x] Streamlit UI - [x] Docker deployment - [x] LangSmith integration ### 🚧 In Progress - [ ] GraphRAG for entity relationships - [ ] Agentic RAG with tool calling - [ ] Advanced caching strategies - [ ] Multi-modal support (images, tables) ### πŸ“‹ Planned - [ ] Fine-tuned embedding models - [ ] Query intent classification - [ ] Conversational memory - [ ] Kubernetes deployment - [ ] Evaluation dashboard --- ## 🀝 Contributing Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request --- ## πŸ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## πŸ™ Acknowledgments - [LangChain](https://github.com/langchain-ai/langchain) for RAG orchestration - [Pinecone](https://www.pinecone.io/) for vector database - [Arize AI](https://arize.com/) for observability - The open-source AI community --- ## πŸ“ž Contact **Jinno** - AI Engineer specializing in LLM applications - 🐦 Twitter: [@jinno_ai](https://twitter.com/jinno_ai) - πŸ’Ό LinkedIn: [jinno-ai](https://linkedin.com/in/jinno-ai) - πŸ“§ Email: contact@jinno-ai.dev - 🌐 Portfolio: [jinno-ai.dev](https://jinno-ai.dev) ---
⭐️ **If you find this project helpful, please consider giving it a star!** ⭐️ Made with ❀️ by [Jinno](https://github.com/jinno-ai)