1 Star 0 Fork 0

九艺 / 通用网络爬虫

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
BSD-3-Clause

通用网络爬虫

介绍

一款简单高性能的网络爬虫

软件架构

软件架构说明
universal-web-crawler
├── bin
│   ├── WebCrawler 爬虫
│   └── WebCrawler.cfg 配置文件
├── docs 文档
│   ├── 流程图1.jpg
│   └── 流程图2.jpg
├── LICENSE
├── plugins 插件
│   ├── DomainLimit.cpp 域名限制
│   ├── DomainLimit.h
│   ├── HeaderFilter.cpp 头部过滤器
│   ├── HeaderFilter.h
│   ├── HeaderFilter.mak
│   ├── MaxDepth.cpp 递归深度
│   ├── MaxDepth.h
│   ├── MaxDepth.mak
│   ├── mkall make工程
│   ├── SaveHTMLToFile.cpp 存储html文件
│   ├── SaveHTMLToFile.h
│   ├── SaveHTMLToFile.mak
│   ├── SaveImageToFile.cpp 存储图片
│   ├── SaveImageToFile.h
│   └── SaveImageToFile.mak
├── README.md
└── src
├── BloomFilter.cpp 布隆过滤器
├── BloomFilter.h
├── Configurator.cpp 配置器
├── Configurator.h
├── DnsThread.cpp dns解析线程类
├── DnsThread.h
├── Hash.cpp 哈希类
├── Hash.h
├── Http.h
├── Log.cpp 日志类
├── Log.h
├── Main.cpp main文件
├── Makefile make工程文件
├── MultiIo.cpp 多路复用器
├── MultiIo.h
├── Plugin.h
├── PluginMngr.cpp 插件管理器
├── PluginMngr.h
├── Precompile.h
├── RecvThread.cpp 接收线程
├── RecvThread.h
├── SendThread.cpp 发送线程
├── SendThread.h
├── Socket.cpp 套接字类
├── Socket.h
├── StrKit.cpp 字符串处理工具
├── StrKit.h
├── Thread.cpp 抽象线程类
├── Thread.h
├── Url.cpp url类
├── UrlFilter.h
├── Url.h
├── UrlQueues.cpp url队列
├── UrlQueues.h
├── WebCrawler.cpp 爬虫类
└── WebCrawler.h

安装教程

编译命令

  1. cd src/ && make && cd ..
  2. cd plugins/ && ./mkall && cd ..

使用说明

启动命令

  1. cd bin && ./webCrawler
BSD 3-Clause License Copyright (c) 2024, lanyun777 All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

一款简单高性能的网络爬虫 展开 收起
C++ 等 3 种语言
BSD-3-Clause
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
C++
1
https://gitee.com/lanyun777/universal-web-crawler.git
git@gitee.com:lanyun777/universal-web-crawler.git
lanyun777
universal-web-crawler
通用网络爬虫
master

搜索帮助