157 Star 576 Fork 258

Cherokee/neocrawler

Create your Gitee Account
Explore and code with more than 12 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README
BSD-3-Clause
Copyright (c) 2014, Cherokee(successage@gmail.com) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of neocrawler nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About

牛咖-neocrawler nodejs 的爬虫系统。 特点: 支持web界面方式的摘取规则配置(css selector & regex); 包含无界面的浏览器引擎(phantomjs),支持js产生内容的抓取; 用http代理路由的方式防止抓取并发量过大的情况下被对方屏蔽; nodejs none-block 异步环境下的抓取性能比较高; 中央调度器负责网址的调度(同一时间片内一定数量的抓取任务中根据网站的权重来决定派发任务量; 支持多种抓取实例并存,定制摘取引擎和存储方式。 expand collapse
NodeJS
BSD-3-Clause
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
NodeJS
1
https://gitee.com/dreamidea/neocrawler.git
git@gitee.com:dreamidea/neocrawler.git
dreamidea
neocrawler
neocrawler
master

Search