# zhcw

**Repository Path**: lghdb/zhcw

## Basic Information

- **Project Name**: zhcw
- **Description**: 爬取3D数据
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2018-07-18
- **Last Updated**: 2020-12-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# zhcw

#### 项目介绍
爬取3D数据

#### 软件架构
requirements.txt 项目依赖
zhcw_scraper 是项目的主目录，里面分别包含了爬虫的配置文件，中间件和管道以及爬虫程序


#### 安装教程
### 第一步，安装Python2.7.14的环境，具体可以参考网上教程

### 第二步，创建虚拟环境并且进入环境，当然如果是生产环境可以去掉该步骤
virtualenv -p python venv
source venv/bin/activate

### 第三步，安装scrapyd，开工具是用来部署scrapy的提供一套restapi和一个简单的wen界面，方便管理爬虫
pip install scrapyd

### 第四步，给scrapyd建立一个配置文件，在当前目录下创建一个文件scrapyd.conf，配置信息可以参考：http://scrapyd.readthedocs.io/en/stable/config.html
scrapyd  #启动scrapyd服务器

### 第五步，scrapyd-clien,这是一个类似打包工具，把我们的scrapy程序打包成egg后扔给scrapyd
pip install scrapyd-clien

### 第六步，拉起爬虫代码，并且安装依赖
git clone https://gitee.com/lghdb/zhcw.git
pip install -r requirements.txt

### 第七步，部署爬虫, 具体可以参考：https://github.com/scrapy/scrapyd-client#scrapyd-deploy
scrapyd-deploy -a -p zhcw_scraper --version 2.0

### 第八布，开始运行爬虫
参考地址：https://scrapyd.readthedocs.io/en/latest/api.html
启动爬虫
curl http://localhost:6800/schedule.json -d project=zhcw_scraper -d spider=zhcw -d _version=2.0
停止爬虫
curl http://localhost:6800/cancel.json -d project=zhcw_scraper -d job=971ebb1e8af611e893b3000c29ef5cb2

注意其中参数自行修改为正确的参数