# Crawler

**Repository Path**: DanYuJie/Crawler

## Basic Information

- **Project Name**: Crawler
- **Description**: 定期爬取新闻网站
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2017-10-28
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Crawler
定期爬取新闻网站

## 环境：
```
python版本:python 3.5
数据库:mongodb
爬虫框架:scrapy
后台:flask
```
## 部署

_ps:所有的操作都在项目根目录下_

**1. 创建虚拟环境**

```
$ virtualenv env -p python3
$ . ./activate_env.sh
$ pip install -r requirements.txt

```

在根目录下创建activate_env.sh,内容如下

```
. env/bin/activate
```

**2.修改配置文件**

```
$ cp config.py.example config.py
$ cp instance/config.py.example instance/config.py
$ cp gunicorn.py.example gunicorn.py
```

修改gunicorn.py中的chdir(工作目录)

**3.启动redis**

```
$ redis-server
```

如果redis默认开机启动则不需要

**4.启动mongodb**
```
$ mongo
```

如果mongo默认开机启动则不需要

**5.启动celery**

```
celery -A app.celery worker --loglevel=info -B

```

-B参数的意思是使用了 scheduler, 启动 celery需要加上-B 参数,从而使用celery的beat定时器

**6.启动项目**

在项目根目录下启动项目

```
$ gunicorn -c gunicorn_conf.py run:app
```