# contextgem **Repository Path**: mirrors_lepy/contextgem ## Basic Information - **Project Name**: contextgem - **Description**: ContextGem: Effortless LLM extraction from documents - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-21 - **Last Updated**: 2025-10-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README  # ContextGem: Effortless LLM extraction from documents | | | |----------|--------| | **Package** | [](https://pypi.org/project/contextgem/) [](https://pepy.tech/projects/contextgem) [](https://www.python.org/downloads/) [](https://opensource.org/licenses/Apache-2.0) | | **Quality** | [](https://github.com/shcherbak-ai/contextgem/actions/workflows/ci-tests.yml) [](https://github.com/shcherbak-ai/contextgem/actions) [](https://github.com/shcherbak-ai/contextgem/actions/workflows/codeql.yml) [](https://github.com/shcherbak-ai/contextgem/actions/workflows/bandit-security.yml) [](https://www.bestpractices.dev/projects/10489) | | **Tools** | [](https://github.com/astral-sh/uv) [](https://github.com/astral-sh/ruff) [](https://pydantic.dev) [](https://github.com/microsoft/pyright) [](https://github.com/pre-commit/pre-commit) [](https://github.com/fpgmaas/deptry) [](https://github.com/pypa/hatch) | | **Docs** | [](https://github.com/shcherbak-ai/contextgem/actions/workflows/docs.yml) [](https://shcherbak-ai.github.io/contextgem/)  [](https://deepwiki.com/shcherbak-ai/contextgem) | | **Community** | [](CODE_OF_CONDUCT.md) [](https://github.com/shcherbak-ai/contextgem/issues?q=is%3Aissue+is%3Aclosed) [](https://github.com/shcherbak-ai/contextgem/commits/main) |
Built-in abstractions | ContextGem | Other LLM frameworks* |
---|---|---|
Automated dynamic prompts | 🟢 | ◯ |
Automated data modelling and validators | 🟢 | ◯ |
Precise granular reference mapping (paragraphs & sentences) | 🟢 | ◯ |
Justifications (reasoning backing the extraction) | 🟢 | ◯ |
Neural segmentation (using wtpsplit's SaT models) | 🟢 | ◯ |
Multilingual support (I/O without prompting) | 🟢 | ◯ |
Single, unified extraction pipeline (declarative, reusable, fully serializable) | 🟢 | 🟡 |
Grouped LLMs with role-specific tasks | 🟢 | 🟡 |
Nested context extraction | 🟢 | 🟡 |
Unified, fully serializable results storage model (document) | 🟢 | 🟡 |
Extraction task calibration with examples | 🟢 | 🟡 |
Built-in concurrent I/O processing | 🟢 | 🟡 |
Automated usage & costs tracking | 🟢 | 🟡 |
Fallback and retry logic | 🟢 | 🟢 |
Multiple LLM providers | 🟢 | 🟢 |
📄 Document |
---|
Create a Document that contains text and/or visual content representing your document (contract, invoice, report, CV, etc.), from which an LLM extracts information (aspects and/or concepts). Learn more |
🔍 Aspects | 💡 Concepts |
---|---|
Define Aspects to extract text segments from the document (sections, topics, themes). You can organize content hierarchically and combine with concepts for comprehensive analysis. Learn more | Define Concepts to extract specific data points with intelligent inference: entities, insights, structured objects, classifications, numerical calculations, dates, ratings, and assessments. Learn more |
🔄 Alternative: Configure Extraction Pipeline |
---|
Create a reusable collection of predefined aspects and concepts that enables consistent extraction across multiple documents. Learn more |
🤖 LLM | 🤖🤖 Alternative: LLM Group (advanced) |
---|---|
Configure a cloud or local LLM that will extract aspects and/or concepts from the document. DocumentLLM supports fallback models and role-based task routing for optimal performance. Learn more | Configure a group of LLMs with unique roles for complex extraction workflows. You can route different aspects and/or concepts to specialized LLMs (e.g., simple extraction vs. reasoning tasks). Learn more |