# deep-searcher **Repository Path**: zhch158_admin/deep-searcher ## Basic Information - **Project Name**: deep-searcher - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-31 - **Last Updated**: 2025-06-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README 
config.set_provider_config("llm", "(LLMName)", "(Arguments dict)")
The "LLMName" can be one of the following: ["DeepSeek", "OpenAI", "XAI", "SiliconFlow", "Aliyun", "PPIO", "TogetherAI", "Gemini", "Ollama", "Novita"]
The "Arguments dict" is a dictionary that contains the necessary arguments for the LLM class.
Make sure you have prepared your OPENAI API KEY as an env variable OPENAI_API_KEY.
config.set_provider_config("llm", "OpenAI", {"model": "o1-mini"})
More details about OpenAI models: https://platform.openai.com/docs/models
Make sure you have prepared your Bailian API KEY as an env variable DASHSCOPE_API_KEY.
config.set_provider_config("llm", "Aliyun", {"model": "qwen-plus-latest"})
More details about Aliyun Bailian models: https://bailian.console.aliyun.com
config.set_provider_config("llm", "OpenAI", {"model": "qwen/qwen3-235b-a22b:free", "base_url": "https://openrouter.ai/api/v1", "api_key": "OPENROUTER_API_KEY"})
More details about OpenRouter models: https://openrouter.ai/qwen/qwen3-235b-a22b:free
Make sure you have prepared your DEEPSEEK API KEY as an env variable DEEPSEEK_API_KEY.
config.set_provider_config("llm", "DeepSeek", {"model": "deepseek-reasoner"})
More details about DeepSeek: https://api-docs.deepseek.com/
Make sure you have prepared your SILICONFLOW API KEY as an env variable SILICONFLOW_API_KEY.
config.set_provider_config("llm", "SiliconFlow", {"model": "deepseek-ai/DeepSeek-R1"})
More details about SiliconFlow: https://docs.siliconflow.cn/quickstart
Make sure you have prepared your TOGETHER API KEY as an env variable TOGETHER_API_KEY.
config.set_provider_config("llm", "TogetherAI", {"model": "deepseek-ai/DeepSeek-R1"})
For Llama 4:
config.set_provider_config("llm", "TogetherAI", {"model": "meta-llama/Llama-4-Scout-17B-16E-Instruct"})
You need to install together before running, execute: pip install together. More details about TogetherAI: https://www.together.ai/
Make sure you have prepared your XAI API KEY as an env variable XAI_API_KEY.
config.set_provider_config("llm", "XAI", {"model": "grok-2-latest"})
More details about XAI Grok: https://docs.x.ai/docs/overview#featured-models
Make sure you have prepared your ANTHROPIC API KEY as an env variable ANTHROPIC_API_KEY.
config.set_provider_config("llm", "Anthropic", {"model": "claude-sonnet-4-0"})
More details about Anthropic Claude: https://docs.anthropic.com/en/home
Make sure you have prepared your GEMINI API KEY as an env variable GEMINI_API_KEY.
config.set_provider_config('llm', 'Gemini', { 'model': 'gemini-2.0-flash' })
You need to install gemini before running, execute: pip install google-genai. More details about Gemini: https://ai.google.dev/gemini-api/docs
Make sure you have prepared your PPIO API KEY as an env variable PPIO_API_KEY. You can create an API Key here.
config.set_provider_config("llm", "PPIO", {"model": "deepseek/deepseek-r1-turbo"})
More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher
Follow these instructions to set up and run a local Ollama instance:
Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux).
View a list of available models via the model library.
Fetch available LLM models via ollama pull <name-of-model>
Example: ollama pull qwen3
To chat directly with a model from the command line, use ollama run <name-of-model>.
By default, Ollama has a REST API for running and managing models on http://localhost:11434.
config.set_provider_config("llm", "Ollama", {"model": "qwen3"})
Make sure you have prepared your Volcengine API KEY as an env variable VOLCENGINE_API_KEY. You can create an API Key here.
config.set_provider_config("llm", "Volcengine", {"model": "deepseek-r1-250120"})
More details about Volcengine: https://www.volcengine.com/docs/82379/1099455?utm_source=github_deep-searcher
Make sure you have prepared your GLM API KEY as an env variable GLM_API_KEY.
config.set_provider_config("llm", "GLM", {"model": "glm-4-plus"})
You need to install zhipuai before running, execute: pip install zhipuai. More details about GLM: https://bigmodel.cn/dev/welcome
Make sure you have prepared your Amazon Bedrock API KEY as an env variable AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
config.set_provider_config("llm", "Bedrock", {"model": "us.deepseek.r1-v1:0"})
You need to install boto3 before running, execute: pip install boto3. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/
Make sure you have prepared your watsonx.ai credentials as env variables WATSONX_APIKEY, WATSONX_URL, and WATSONX_PROJECT_ID.
config.set_provider_config("llm", "watsonx", {"model": "us.deepseek.r1-v1:0"})
You need to install ibm-watsonx-ai before running, execute: pip install ibm-watsonx-ai. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models
config.set_provider_config("embedding", "(EmbeddingModelName)", "(Arguments dict)")
The "EmbeddingModelName" can be one of the following: ["MilvusEmbedding", "OpenAIEmbedding", "VoyageEmbedding", "SiliconflowEmbedding", "PPIOEmbedding", "NovitaEmbedding"]
The "Arguments dict" is a dictionary that contains the necessary arguments for the embedding model class.
Make sure you have prepared your OpenAI API KEY as an env variable OPENAI_API_KEY.
config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "text-embedding-3-small"})
More details about OpenAI models: https://platform.openai.com/docs/guides/embeddings/use-cases
Make sure you have prepared your OpenAI API KEY as an env variable OPENAI_API_KEY.
config.set_provider_config("embedding", "OpenAIEmbedding", {
"model": "text-embedding-ada-002",
"azure_endpoint": "https://.openai.azure.com/",
"api_version": "2023-05-15"
})
Use the built-in embedding model in Pymilvus, you can set the model name as "default", "BAAI/bge-base-en-v1.5", "BAAI/bge-large-en-v1.5", "jina-embeddings-v3", etc.
See [milvus_embedding.py](deepsearcher/embedding/milvus_embedding.py) for more details.
config.set_provider_config("embedding", "MilvusEmbedding", {"model": "BAAI/bge-base-en-v1.5"})
config.set_provider_config("embedding", "MilvusEmbedding", {"model": "jina-embeddings-v3"})
For Jina's embedding model, you needJINAAI_API_KEY.
You need to install pymilvus model before running, execute: pip install pymilvus.model. More details about Pymilvus: https://milvus.io/docs/embeddings.md
Make sure you have prepared your VOYAGE API KEY as an env variable VOYAGE_API_KEY.
config.set_provider_config("embedding", "VoyageEmbedding", {"model": "voyage-3"})
You need to install voyageai before running, execute: pip install voyageai. More details about VoyageAI: https://docs.voyageai.com/embeddings/
config.set_provider_config("embedding", "BedrockEmbedding", {"model": "amazon.titan-embed-text-v2:0"})
You need to install boto3 before running, execute: pip install boto3. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/
Make sure you have prepared your Novita AI API KEY as an env variable NOVITA_API_KEY.
config.set_provider_config("embedding", "NovitaEmbedding", {"model": "baai/bge-m3"})
More details about Novita AI: https://novita.ai/docs/api-reference/model-apis-llm-create-embeddings?utm_source=github_deep-searcher&utm_medium=github_readme&utm_campaign=link
Make sure you have prepared your Siliconflow API KEY as an env variable SILICONFLOW_API_KEY.
config.set_provider_config("embedding", "SiliconflowEmbedding", {"model": "BAAI/bge-m3"})
More details about Siliconflow: https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings
Make sure you have prepared your Volcengine API KEY as an env variable VOLCENGINE_API_KEY.
config.set_provider_config("embedding", "VolcengineEmbedding", {"model": "doubao-embedding-text-240515"})
More details about Volcengine: https://www.volcengine.com/docs/82379/1302003
Make sure you have prepared your GLM API KEY as an env variable GLM_API_KEY.
config.set_provider_config("embedding", "GLMEmbedding", {"model": "embedding-3"})
You need to install zhipuai before running, execute: pip install zhipuai. More details about GLM: https://bigmodel.cn/dev/welcome
Make sure you have prepared your Gemini API KEY as an env variable GEMINI_API_KEY.
config.set_provider_config("embedding", "GeminiEmbedding", {"model": "text-embedding-004"})
You need to install gemini before running, execute: pip install google-genai. More details about Gemini: https://ai.google.dev/gemini-api/docs
config.set_provider_config("embedding", "OllamaEmbedding", {"model": "bge-m3"})
You need to install ollama before running, execute: pip install ollama. More details about Ollama Python SDK: https://github.com/ollama/ollama-python
Make sure you have prepared your PPIO API KEY as an env variable PPIO_API_KEY.
config.set_provider_config("embedding", "PPIOEmbedding", {"model": "baai/bge-m3"})
More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher
config.set_provider_config("embedding", "FastEmbedEmbedding", {"model": "intfloat/multilingual-e5-large"})
You need to install fastembed before running, execute: pip install fastembed. More details about fastembed: https://github.com/qdrant/fastembed
Make sure you have prepared your WatsonX credentials as env variables WATSONX_APIKEY, WATSONX_URL, and WATSONX_PROJECT_ID.
config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "ibm/slate-125m-english-rtrvr-v2"})
config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "sentence-transformers/all-minilm-l6-v2"})
You need to install ibm-watsonx-ai before running, execute: pip install ibm-watsonx-ai. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models
config.set_provider_config("vector_db", "(VectorDBName)", "(Arguments dict)")
The "VectorDBName" can be one of the following: ["Milvus"] (Under development)
The "Arguments dict" is a dictionary that contains the necessary arguments for the Vector Database class.
config.set_provider_config("vector_db", "Milvus", {"uri": "./milvus.db", "token": ""})
More details about Milvus Config:
uri as a local file, e.g. ./milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.
http://localhost:19530, as your uri.
You can also use any other connection parameters supported by Milvus such as host, user, password, or secure.
uri and token
according to the Public Endpoint and API Key in Zilliz Cloud.
config.set_provider_config("vector_db", "AzureSearch", {
"endpoint": "https://.search.windows.net",
"index_name": "",
"api_key": "",
"vector_field": ""
})
More details about Milvus Config:
config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
The "FileLoaderName" can be one of the following: ["PDFLoader", "TextLoader", "UnstructuredLoader"]
The "Arguments dict" is a dictionary that contains the necessary arguments for the File Loader class.
You can use Unstructured in two ways:
UNSTRUCTURED_API_KEY and UNSTRUCTURED_API_URLconfig.set_provider_config("file_loader", "UnstructuredLoader", {})
pip install unstructured-ingestpip install "unstructured[all-docs]"pip install "unstructured[pdf]"config.set_provider_config("file_loader", "DoclingLoader", {})
Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats
You need to install docling before running, execute: pip install docling. More details about Docling: https://docling-project.github.io/docling/
config.set_provider_config("web_crawler", "(WebCrawlerName)", "(Arguments dict)")
The "WebCrawlerName" can be one of the following: ["FireCrawlCrawler", "Crawl4AICrawler", "JinaCrawler"]
The "Arguments dict" is a dictionary that contains the necessary arguments for the Web Crawler class.
Make sure you have prepared your FireCrawl API KEY as an env variable FIRECRAWL_API_KEY.
config.set_provider_config("web_crawler", "FireCrawlCrawler", {})
More details about FireCrawl: https://docs.firecrawl.dev/introduction
Make sure you have run crawl4ai-setup in your environment.
config.set_provider_config("web_crawler", "Crawl4AICrawler", {"browser_config": {"headless": True, "verbose": True}})
You need to install crawl4ai before running, execute: pip install crawl4ai. More details about Crawl4AI: https://docs.crawl4ai.com/
Make sure you have prepared your Jina Reader API KEY as an env variable JINA_API_TOKEN or JINAAI_API_KEY.
config.set_provider_config("web_crawler", "JinaCrawler", {})
More details about Jina Reader: https://jina.ai/reader/
config.set_provider_config("web_crawler", "DoclingCrawler", {})
Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats
You need to install docling before running, execute: pip install docling. More details about Docling: https://docling-project.github.io/docling/