# CognitiveKernel-Pro **Repository Path**: mirrors_Tencent/CognitiveKernel-Pro ## Basic Information - **Project Name**: CognitiveKernel-Pro - **Description**: Deep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-06 - **Last Updated**: 2025-09-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [![arXiv](https://img.shields.io/badge/arXiv-2508.00414-b31b1b.svg)](https://arxiv.org/abs/2508.00414) [![Data](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-CognitiveKernel--Pro--Data-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/datasets/CognitiveKernel/CognitiveKernel-Pro-Query) [![Data](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-CognitiveKernel--Pro--SFT-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/datasets/CognitiveKernel/CognitiveKernel-Pro-SFT) [![Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-CognitiveKernel--Pro--Model-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/CognitiveKernel/Qwen3-8B-CK-Pro)

- A state-of-the-art open-source agent utilizing (as many as possible) free tools; the only paid tool is the Google Search API, which can be replaced with the free DuckDuckGo API if needed. - Fully reproducible open-source SFT training recipe that outperforms RL-based models like WebDancer and WebSailor—no RL required. ## Running Cognitive Kernel-Pro (CogKernel-Pro for short) Agent ### Environment #### Python - python3.12 is recommended - Dependency: ```bash pip install boto3 botocore openai duckduckgo_search rich numpy openpyxl biopython mammoth markdownify pandas pdfminer-six python-pptx pdf2image puremagic pydub SpeechRecognition bs4 youtube-transcript-api requests transformers protobuf openai langchain_openai langchain pip install selenium helium smolagents ``` #### Web Server (Powered by Playwright) - **On Linux**: - Checkout the script for hosting the web engine [./ck_web/_web/run_local.sh](./ck_web/_web/run_local.sh). - Dependency: ```bash apt-get install -y poppler-utils default-jre libreoffice-common libreoffice-java-common libreoffice ffmpeg # for ck_web sh ck_pro/ck_web/_web/run_local.sh ``` - **IMPORTANT**: it is recommended to run this program in a sandbox since the generated python code is directly executed and currently there are no safety checkings. (Disable sudo for your user to ensure safety.) ```bash # run with root echo "${USER}" 'ALL=(ALL) NOPASSWD: !ALL' | tee /etc/sudoers.d/${USER}-rule chmod 440 /etc/sudoers.d/${USER}-rule deluser ${USER} sudo hostnamectl set-hostname localhost ``` - **On Mac**: - Checkout the script for hosting the web engine [./ck_web/_web/run_local_mac.sh](./ck_web/_web/run_local_mac.sh) - Dependency: ```zsh brew install --cask libreoffice brew install poppler brew install ffmpeg # for ck_web sh ck_pro/ck_web/_web/run_local_mac.sh ``` - **IMPORTANT**: it is recommended to run this program in a sandbox since the generated python code is directly executed and currently there are no safety checkings. (Disable sudo for your user to ensure safety.) ```bash # run with root echo "${USER}" 'ALL=(ALL) NOPASSWD: !ALL' | tee /etc/sudoers.d/${USER}-rule chmod 440 /etc/sudoers.d/${USER}-rule dseditgroup -o edit -d "$USER" admin scutil --set HostName localhost ``` ### Example (A simple example) - See [`ck_main/_test`](./ck_main/_test) for a simple example and its corresponding outputs ```bash export PYTHONPATH=/your/path/to/CogKernel-Pro # Assume we have set up a vllm model server and a web-browser server (currently these are active: WEB_IP WEB_IP=localhost:3001 # web-browser server LLM_URL=http://xx.xx.xx.xx:8080/v1/chat/completions # vllm model server #LLM_URL=gpt:gpt-4.1 # using gpt #VLM_URL=gpt:gpt-4.1 # using gpt #LLM_URL=claude: # using claude #VLM_URL=claude: # using claude # run simple test MAIN_ARGS="{'web_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}, 'web_env_kwargs': {'web_ip': '${WEB_IP}'}}, 'file_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}}, 'model': {'call_target': '${LLM_URL}'}}" # use "NO_NULL_STDIN=1" for easier debugging # you can also remove `--input` field to directly input your task from stdin # you can also remove `-mpdb` flag to run the program directly instead of in debugging mode NO_NULL_STDIN=1 python3 -u -mpdb -m ck_pro.ck_main.main --updates "${MAIN_ARGS}" --input /your/path/to/simple_test.jsonl --output /your/path/to/simple_test.output.jsonl |& tee _log_simple_test less -R _log_simple_test # use 'less -R' to see the colored outputs ``` ### Example (Experimenting on the GAIA dataset) ```bash # Step 1: prepare data # decompress the gaia data (or you can download it by yourself from huggingface) # -> assume all the gaia related input files are at the same DIR as the input json meta-file unzip /your/path/to/CogKernel-Pro/Evaluation/gaia2504.zip # Step 2: prepare web service (recommending using a PC or laptop to enable better network connection) # -> prepare things according to "./ck_web/_web/run_local.sh" #LISTEN_PORT=3001 npm start #WEB_IP=localhost:3001 # web-browser server # Step 3: prepare a vllm instance for model calling # use gpt #LLM_URL=gpt:gpt-4.1 #VLM_URL=gpt:gpt-4.1 #export AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT" #export AZURE_OPENAI_API_KEY="YOUR_API_KEY" #export AZURE_OPENAI_API_VERSION="YOUR_API_VERSION" # or use claude #LLM_URL=claude: # using claude #VLM_URL=claude: # using claude #export AWS_ACCESS_KEY="YOUR_KEY" #export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY" LLM_URL=http://xx.xx.xx.xx:8080/v1/chat/completions # vllm model server VLM_URL=http://xx.xx.xx.xx:8081/v1/chat/completions # for VLM # Step 4: Setup search engine # either using google api #export SEARCH_BACKEND="Google" #export SEARCH_API_KEY="YOUR_API_KEY" #export SEARCH_CSE_ID="YOUR_CSE_ID" # or simply use DuckDuckGo export SEARCH_BACKEND="DuckDuckGo" # Step 5: run export PYTHONPATH=/your/path/to/CogKernel-Pro/ #pip install ... # see above in `Environment` # it will be more stable to run a new web-browser for each web call, setup WEB_PORT (web browser service's port) and WEB_DIR (main dir of the web browser service) # moreover, it is slightly better to use non-boxed screenshot (make sure to update the latest `server.js` and set screenshot_boxed=False) WEB_DIR=/path/to/_web/ # where we put `server.js` and related `node_modules` WEB_PORT=3001 MAIN_ARGS="{'web_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}, 'web_env_kwargs': {'web_ip': 'localhost:${WEB_PORT}', 'web_command': 'cd ${WEB_DIR}; LISTEN_PORT=${WEB_PORT} npm start', 'screenshot_boxed': False}}, 'file_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}}, 'model': {'call_target': '${LLM_URL}'}}" python3.12 -u -m ck_pro.ck_main.main --updates "${MAIN_ARGS}" --input /your/path/to/gaia_dev.jsonl --output /your/path/to/gaia_dev.output.jsonl |& tee -a _log_gaia_dev # Step 6: analyze and check the output python -m ck_pro.ck_main.scripts.analyze -f /your/path/to/output/gaia_dev.output.jsonl -b 0 ``` ### Extra Running Config ```bash # calling claude+thinking for the outside main-agent LLM_URL=gpt:gpt-4.1 # still use gpt4.1 for sub-agents VLM_URL=gpt:gpt-4.1 export AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT" # find these keys in the corresponding spreadsheets export AZURE_OPENAI_API_KEY="YOUR_API_KEY" export AZURE_OPENAI_API_VERSION="YOUR_API_VERSION" export AWS_ACCESS_KEY="YOUR_KEY" export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY" MAIN_ARGS="{'web_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}, 'web_env_kwargs': {'web_ip': 'localhost:${WEB_PORT}', 'web_command': 'cd ${WEB_DIR}; LISTEN_PORT=${WEB_PORT} npm start', 'screenshot_boxed': False}}, 'file_agent': {'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}}, 'model': {'thinking': 'True', 'call_target': 'claude:', 'call_kwargs': {'temperature': 0.2, 'top_p': 0.95, 'max_tokens': 4096}}}" # use claude+thinking for main-agent, allowing more max_token budgets ``` ### Enabling Reflection Extra configs required: ```bash # configuration of the evaluator LLM export EVALUATOR_LLM=gpt:gpt-4.1 # langchain export AZURE_OPENAI_API_VERSION=2025-01-01-preview export OPENAI_API_TYPE=azure_ai export AZURE_INFERENCE_ENDPOINT=$AZURE_OPENAI_ENDPOINT export AZURE_INFERENCE_CREDENTIAL=$AZURE_OPENAI_API_KEY ``` Extra arguments when running ck_pro.ck_main.main: `--inference-time-evaluation-method`, where you can choose from `no_answer` and `gpt_judge`. `no_answer` simply checks whether the agent have returned anything meaningful, while `gpt_judge` use the LLM specified by `EVALUATOR_LLM` to perform evaluation on the trajectory and decide whether there's a need to retry. ```bash python -u -m ck_pro.ck_main.main --updates "${MAIN_ARGS}" --inference-time-evaluation-method gpt_judge --max_retry_num 3 --input /path/to/input --output /path/to/output ``` ## Data ### Saved Data Format The format of saved data is as followed: - The class of `Session` is used to save trajectories [session.py](ck_pro/agents/session.py) - The analysis script could help understand the data structure [analyze.py](ck_pro/ck_main/scripts/analyze.py) ```python # one instance in one json-line INSTANCE = { "id": "Task ID", "task": "Task Description", "session": { # corresponding to the class of Session "id": "Session ID", "info": {...}, # other information such model calling token counts "task": "Original Task Description", "steps": [ # information for each step { "step_idx": 0, "plan": { "thought": "Model's thought", "code": "Model's output code", "state": {...}, # updated state "llm_input": [], # model's direct input messages "llm_output": "Model's raw output", # model's raw output }, "action": { "...": ..., # similar to plan # "observation": ..., # simple outputs from code execution # if calling a sub-agent, we have more complex structures storing the session from the sub-agent "observation": { # see the class of AgentResult "output": "formatted outputs", "log": "logs", "task": "Task for the sub-agent", "session": {...}, }, }, }, # step 0 ..., # later steps { "...": ..., # plan and action "end": { # in the final step, we may also have an ending module if configured "..." # fields are similar to plan and action } } # final step ], }, } ``` ### System Prompts Prompts are saved in `prompts.py` files of each agent, such as [ck_pro/ck_main/prompts.py](ck_pro/ck_main/prompts.py),[ck_web/prompts.py](ck_pro/ck_web/prompts.py). Check out [detailed notes](ck_pro/readme.md) for more details. ## Data The queries and answers of Multi-hop URLQA and AgentWebQA is [here](https://huggingface.co/datasets/CognitiveKernel/CognitiveKernel-Pro-Query). A portion of the SFT data that is permitted for open-source release due to licensing restrictions is available [here](https://huggingface.co/datasets/CognitiveKernel/CognitiveKernel-Pro-SFT). We release the checkpoint of fine-tuned Qwen3-8B-CK-Pro [here](https://huggingface.co/CognitiveKernel/Qwen3-8B-CK-Pro). ### Trajectory sampling We use `gpt-4.1` to sample trajectories. You need to download the queries first and then run the main agent execution code that is similar to previous sections. You may add additional arguments `--sampling-mode --evaluation-method llm_score --max_retry_num 3` to sample the same query up to 3 times until it is successful. ```bash export LISTEN_PORT=XXXX export WEB_IP=localhost:${LISTEN_PORT} lsof -ti tcp:$LISTEN_PORT | xargs kill -9 export LLM_URL=gpt:gpt-4.1 export VLM_URL=gpt:gpt-4.1 export AZURE_OPENAI_API_KEY="YOUR_API_KEY" export AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT" export AZURE_OPENAI_API_VERSION=2025-01-01-preview export MAX_FILE_READ_TOKENS=10000 export MAX_FILE_SCREENSHOT=5 export SEARCH_BACKEND=Google export SEARCH_API_KEY="YOUR_GOOGLE_KEY" export SEARCH_CSE_ID="YOUR_CSE_ID" #langchain export EVALUATOR_LLM=gpt:gpt-4.1 export AZURE_OPENAI_API_VERSION=2025-01-01-preview export OPENAI_API_TYPE=azure_ai export AZURE_INFERENCE_ENDPOINT=$AZURE_OPENAI_ENDPOINT export AZURE_INFERENCE_CREDENTIAL=$AZURE_OPENAI_API_KEY export MAIN_ARGS="{'web_agent': {'max_steps': 25, 'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}, 'web_env_kwargs': {'web_ip': '${WEB_IP}', 'web_command': 'cd /path/to/ck_pro/ck_web/_web; LISTEN_PORT=${LISTEN_PORT} npm start'}}, 'file_agent': {'max_steps': 20, 'model': {'call_target': '${LLM_URL}'}, 'model_multimodal': {'call_target': '${VLM_URL}'}}, 'model': {'call_target': '${LLM_URL}'}, 'max_steps': 12}" python -u -m ckv3.ck_main.main --updates "${MAIN_ARGS}" --sampling-mode --evaluation-method llm_score --max_retry_num 3 --input /input/query.jsonl --output /output/trajectory.output.jsonl ``` ### Rejection Sampling and SFT Data Post-Process Run the code [convert_sft.py](data/convert_sft.py) and choose a type of rejection sampling (`llm_judge` for langchain LLM score or `em` for exact match). ```bash python convert_sft.py --input_file /path/to/trajectory.output.jsonl --output_file XXX.sft.jsonl ``` ## Friendly links to relevant agents works from Tencent AI Lab - [Cognitive Kernel](https://github.com/Tencent/CogKernel) (NAACL 2025 Demo): The base version of Cognitive Kernel-Pro agents. - [WebVoyager](https://github.com/MinorJerry/WebVoyager) and [OpenWebVoyager](https://github.com/MinorJerry/OpenWebVoyager/) (ACL 2024 and ACL 2025): Self-improving multimodal agents. - [WebEvolver, WebCoT](https://github.com/Tencent/SelfEvolvingAgent) (EMNLP 2025 Main and Findings): Agents post-training with world model and cognitive behavior injections to CoT. - [Web Agents Rollback](https://arxiv.org/abs/2504.11788): Enhancing Web Agents with Explicit Rollback Mechanisms - [DocBench](https://github.com/Anni-Zou/DocBench): Data generation for document agents. - [PersonaHub](https://github.com/tencent-ailab/persona-hub): Scaling Synthetic Data Creation with 1,000,000,000 Personas - [MobileGUI-RL](https://arxiv.org/abs/2507.05720): MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment ## Cite this work ``` @misc{fang2025cognitivekernelpro, title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training}, author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu}, year={2025}, eprint={2508.00414}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2508.00414}, } ``` #### Contact tianqfang(at)tencent(dot)com